Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Nov 10.
Published in final edited form as: Biometrika. 2015 Apr 2;102(2):421–437. doi: 10.1093/biomet/asv006

Effective dimension reduction for sparse functional data

F YAO 1, E LEI 1, Y WU 2
PMCID: PMC4640368  NIHMSID: NIHMS733755  PMID: 26566293

Summary

We propose a method of effective dimension reduction for functional data, emphasizing the sparse design where one observes only a few noisy and irregular measurements for some or all of the subjects. The proposed method borrows strength across the entire sample and provides a way to characterize the effective dimension reduction space, via functional cumulative slicing. Our theoretical study reveals a bias-variance trade-off associated with the regularizing truncation and decaying structures of the predictor process and the effective dimension reduction space. A simulation study and an application illustrate the superior finite-sample performance of the method.

Keywords: Cumulative slicing, Effective dimension reduction, Inverse regression, Sparse functional data

1. Introduction

In functional data analysis, one is often interested in how a scalar response YR varies with a smooth trajectory X(t), where t is an index variable defined on a closed interval T; see Ramsay & Silverman (2005). To be specific, one seeks to model the relationship Y = M(X; ϵ), where M is a smooth functional and the error process ϵ has zeromean and finite variance σ2 and is independent of X. Although modelling M parametrically can be restrictive in many applications, modelling M nonparametrically is infeasible in practice due to slow convergence rates associated with the curse of dimensionality. Therefore a class of semiparametric index models has been proposed to approximate M(X; ϵ) with an unknown link function g:RK+1R; that is,

Y=g(β1,X,,βK,X;ϵ), (1)

where K is the reduced dimension of the model, β1, … , βK are linearly independent index functions, and ⟨u, v⟩ = ∫ u(t)v(t) dt is the usual L2 inner product. The functional linear model Y = β0 + ∫ β1(t)X(t) dt + ϵ is a special case and has been studied extensively (Cardot et al., 1999; Müller & Stadtmüller, 2005; Yao et al., 2005b; Cai & Hall, 2006; Hall & Horowitz, 2007; Yuan & Cai, 2010).

In this article, we tackle the index model (1) from the perspective of effective dimension reduction, in the sense that the K linear projections ⟨β1, X⟩, … , ⟨βK , X⟩ form a sufficient statistic. This is particularly useful when the process X is infinite-dimensional. Our primary goal is to discuss dimension reduction for functional data, especially when the trajectories are corrupted by noise and are sparsely observed with only a few observations for some, or even all, of the subjects. Pioneered by Li (1991) for multivariate data, effective dimension reduction methods are typically link-free, requiring neither specification nor estimation of the link function (Duan & Li, 1991), and aim to characterize the K-dimensional effective dimension reduction space SY∣X = span(β1, … , βK ) onto which X is projected. Such index functions βk are called effective dimension reduction directions, K is the structural dimension, and SY∣X is also known as the central subspace (Cook, 1998). Li (1991) characterized SY∣X via the inversemean E(XY) by sliced inverse regression, which has motivated much work for multivariate data. For instance, Cook & Weisberg (1991) estimated var(XY), Li (1992) dealt with the Hessian matrix of the regression curve, Xia et al. (2002) proposed minimum average variance estimation as an adaptive approach based on kernel methods, Chiaromonte et al. (2002) modified sliced inverse regression for categorical predictors, Li & Wang (2007) worked with empirical directions, and Zhu et al. (2010) proposed cumulative slicing estimation to improve on sliced inverse regression.

The literature on effective dimension reduction for functional data is relatively sparse. Ferré & Yao (2003) proposed functional sliced inverse regression for completely observed functional data, and Li & Hsing (2010) developed sequential χ2 testing procedures to decide the structural dimension of functional sliced inverse regression. Apart from effective dimension reduction approaches, James & Silverman (2005) estimated the index and link functions jointly for an additive form g(β1,X,,βK,X;ϵ)=β0+k=1Kgk(βk,X)+ϵ, assuming that the trajectories are densely or completely observed and that the index and link functions are elements of a finite-dimensional spline space. Chen et al. (2011) estimated the index and additive link functions nonparametrically and relaxed the finite-dimensional assumption for theoretical analysis but retained the dense design.

Jiang et al. (2014) proposed an inverse regression method for sparse functional data by estimating the conditional mean E{X(t)Y=y~} with a two-dimensional smoother applied to pooled observed values of X in a local neighbourhood of (t,y~). The computation associated with a two-dimensional smoother is considerable and further increased by the need to select two different bandwidths. In contrast, we aim to estimate the effective dimension reduction space by drawing inspiration from cumulative slicing for multivariate data (Zhu et al., 2010). When adapted to the functional setting, cumulative slicing offers a novel and computationally simple way of borrowing strength across subjects to handle sparsely observed trajectories. This advantage has not been exploited elsewhere. As we will demonstrate later, although extending cumulative slicing to completely observed functional data is straightforward, it adopts a different strategy for the sparse design via a one-dimensional smoother, with potentially effective usage of the data.

2. Methodology

2·1. Dimension reduction for functional data

Let T be a compact interval, and let X be a random variable defined on the real separable Hilbert space HL2(T) endowed with inner product f,g=Tf(t)g(t)dt and norm ‖f‖ = ⟨f, f1/2. We assume that:

Assumption 1

X is centred and has a finite fourth moment, ∫τ E{X4(t)} dt < ∞.

Under Assumption 1, the covariance surface of X is ∑(s, t) = E{X(s)X(t)}, which generates a Hilbert–Schmidt operator ∑ on H that maps f to (∑f)(t) = ∫τ ∑(s, t) f (s) ds. This operator can be written succinctly as ∑ = E(XX), where the tensor product uv denotes the rank-one operator on H that maps w to (uv)w = ⟨u, wv. By Mercer’s theorem, ∑ admits a spectral decomposition Σ=j=1αjϕjϕj, where the eigenfunctions {ϕj}j=1,2,… form a complete orthonormal system in H and the eigenvalues {αj}j=1,2,… are strictly decreasing and positive such that j=1αj<. Finally, recall that the effective dimension reduction directions β1, … , βK in model (1) are linearly independent functions in H, and the response YR is assumed to be conditionally independent of X given the K projections ⟨β1, X⟩, … , ⟨βK , X⟩.

Zhu et al. (2010) observed that for a fixed y~R, using two slices I1=(,y~] and I2=(y~,+) would maximize the use of data and minimize the variability in each slice. The kernel of the sliced inverse regression operator var{E(XY)} is estimated by the two-slice version Λ0(s,t;y~)m(s,y~)m(t,y~), where m(t,y~)=E{X(t)1(Yy~)} is an unconditional expectation, in contrast to the conditional expectation E{X(t) ∣ Y} of functional sliced inverse regression. Since Λ0 with a fixed y~ spans at most one direction of SY∣X, it is necessary to combine all possible estimates of m(t,y~) by letting y~ run across the support of Y~, an independent copy of Y. Therefore, the kernel of the proposed functional cumulative slicing is

Λ(s,t)=E{m(s,Y~)m(t,Y~)w(Y~)}, (2)

where w(y~) is a known nonnegative weight function. Denote the corresponding integral operator of Λ(s, t) by Λ also. The following theorem establishes the validity of our proposal. Analogous to the multivariate case, a linearity assumption is needed.

Assumption 2

For any function bH, there exist constants c0,,cKR such that

E(b,Xβ1,X,,βK,X)=c0+k=1Kckβk,X.

This assumption is satisfied when X has an elliptically contoured distribution, which is more general than, but has a close connection to, a Gaussian process (Cambanis et al., 1981; Li & Hsing, 2010).

Theorem 1

If Assumptions 1 and 2 hold for model (1), then the linear space spanned by {m(t,y~):y~R} is contained in the linear space spanned by {∑β1, … , ∑βK}, i.e., span({m(t,y~):y~R})span(Σβ1,,ΣβK).

An important observation from Theorem 1 is that for any bH orthogonal to the space spanned by {∑β1, … , ∑βK} and for any xH, we have ⟨b, Λx⟩ = 0, implying that range(Λ) ⊆ span(∑β1, … , ∑βK). If has K nonzero eigenvalues, the space spanned by its eigenfunctions is precisely span(∑β1, … , ∑βK). Recall that our goal is to estimate the central subspace SY∣X, even though the effective dimension reduction directions themselves are not identifiable. For specificity, we regard these eigenfunctions of ∑−1Λ associated with the K largest nonzero eigenvalues as the index functions β1, … , βK, unless stated otherwise.

As the covariance operator ∑ is Hilbert–Schmidt, it is not invertible when defined from H to H. Similarly to Ferré & Yao (2005), let R denote the range of ∑, and let RΣ1={bH:j=1αj1b,ϕjϕj,bRΣ}. Then ∑ is a one-to-one mapping from RΣ1H onto R, with inverse Σ1=j=1αj1ϕjϕj. This is reminiscent of finding a generalized inverse of a matrix. Let ξj = ⟨X, ϕj⟩ denote the jth principal component, or generalized Fourier coefficient, of X, and assume that:

Assumption 3

j=1l=1αj2αl1E2[E{ξj1(YY~)Y~}E{ξl1(YY~)Y~}]<.

Proposition 1

Under Assumptions 1–3, the eigenspace associated with the K nonnull eigenvalues of ∑−1 Λ is well-defined in H.

This is a direct analogue of Theorem 4.8 in He et al. (2003) and Theorem 2.1 in Ferré & Yao (2005).

2·2. Functional cumulative slicing for sparse functional data

For the data {(Xi, Yi) : i = 1, … , n}, independent and identically distributed as (X, Y), the predictor trajectories Xi are observed intermittently, contaminated with noise, and collected in the form of repeated measurements {(Tij, Uij) : i = 1, … , n; j = 1, … , Ni}, where Uij = Xi (Tij) + εij with measurement error εij that is independent and identically distributed as ε with zero mean and constant variance σx2, and independent of all other random variables. When only a few observations are available for some or even all subjects, individual smoothing to recover Xi is infeasible and one must pool data across subjects for consistent estimation.

To estimate the functional cumulative slicing kernel Λ in (2), the key quantity is the unconditional mean m(t,y~)=E{X(t)1(Yy~)}. For sparsely and irregularly observed Xi, cross-sectional estimation as used in multivariate cumulative slicing is inapplicable. To maximize the use of available data, we propose to pool the repeated measurements across subjects via a scatterplot smoother, which works in conjunction with the strategy of cumulative slicing. We use a local linear estimator m^(t,y~)=a^0 (Fan & Gijbels, 1996), solving

min(a0,a1)i=1nj=1Ni{Uij1(Yiy~)a0a1(Tijt)}2K1(Tijth1), (3)

where K1 is a nonnegative and symmetric univariate kernel density and h1 = h1(n) is the bandwidth to control the amount of smoothing. We ignore the dependence among data from the same individual (Lin & Carroll, 2000) and use leave-one-curve-out crossvalidation to select h1 (Rice & Silverman, 1991). Then an estimator of the kernel function Λ(s, t) is its sample moment

Λ^(s,t)=1ni=1nm^(s,Yi)m^(t,Yi)w(Yi). (4)

The distinction between our method and that of Jiang et al. (2014) lies in the inverse function m(t, y) which forms the effective dimension reduction space. It is notable that (4) is a univariate smoother that includes the effective data satisfying {Tij ∈ (th1, t + h1), Yiy}, roughly at an order of (nh1)1/2 for estimating m(t, y) = E{X(t)1(Yy)} for a sparse design with E(Nn) < ∞, where Nn is the expected number of repeated observations per subject. By contrast, equation (2·4) in Jiang et al. (2014) uses the data satisfying {Tij ∈ (tht, t + ht), Yi ∈ (yhy, y + hy)} for estimating m(t, y)= E{X(t) ∣ Y = y}, roughly at an order of (nhthy)1/2. This is reflected in the faster convergence of the estimated operator Λ^ compared with Γ^e in Jiang et al. (2014), indicating potentially effective usage of the data based on univariate smoothing. The computation associated with a two-dimensional smoother is considerable and further exacerbated by the need to select different bandwidths ht and hy.

For the covariance operator ∑, following Yao et al. (2005a), denote the observed raw covariances by Gi(Tij, Til) = UijUil. Since E{Gi(Tij, Til) ∣ Tij, Til} = cov{X(Tij), X(Til)} + σ2δjl, where δjl is 1 if j = l and 0 otherwise, the diagonal of the raw covariances should be removed. Solving

min(b0,b1,b2)i=1njlNi{Gi(Tij,Til)b0b1(Tijs)b2(Tilt)}2K2(Tijsh2,Tilth2) (5)

yields Σ^(s,t)=b^0, where K2 is a nonnegative bivariate kernel density and h2 = h2(n) is the bandwidth chosen by leave-one-curve-out crossvalidation; see Yao et al. (2005a) for details on the implementation. Since the inverse operator ∑−1 is unbounded, we regularize by projection onto a truncated subspace. To be precise, let sn be a possibly divergent sequence and let Πsn=j=1snϕjϕj and Π^sn=j=1snϕ^jϕ^j denote the orthogonal projectors onto the eigensubspaces associated with the sn largest eigenvalues of ∑ and Σ^, respectively. Then ∑sn = πsn∑πsn and Σ^sn=Π^snΣ^Π^sn are two sequences of finite-rank operators converging to ∑ and Σ^ as n → ∞, with bounded inverses Σsn1=j=1snαj1ϕjϕj and Σ^sn1=j=1snα^j1ϕ^jϕ^j, respectively. Finally, we obtain the eigenfunctions associated with the K largest nonzero eigenvalues of Σ^sn1Λ^ as the estimates of the effective dimension reduction directions {β^k,sn}k=1,,K.

The situation for completely observed Xi is similar to the multivariate case and considerably simpler. The quantities m(t,y~) and ∑(s, t) are easily estimated by their respective samplemoments m^(t,y~)=n1i=1nXi(t)1(Yiy~) and Σ^(s,t)=n1i=1nXi(s)Xi(t), while the estimate of Λ remains the same as (4). For densely observed Xi, individual smoothing can be used as a pre-processing step to recover smooth trajectories, and the estimation error introduced in this step can be shown to be asymptotically negligible under certain design conditions, i.e., it is equivalent to the ideal situation of the completely observed Xi (Hall et al., 2006).

For small values of Yi,m^(t,Yi) obtained by (3) may be unstable due to the smaller number of pooled observations in the slice. A suitable weight function w may be used to refine the estimator Λ^(s,t). In our numerical studies, the naive choice of w ≡ 1 performed fairly well compared to other methods. Analogous to the multivariate case, choosing an optimal w remains an open question.

Ferré & Yao (2005) avoided inverting ∑ with the claim that for a finite-rank operator Λ, range(Λ−1∑) = range(∑−1 Λ); however, Cook et al. (2010) showed that this requires more stringent conditions that are not easily fulfilled.

The selection of Kn and sn deserves further study. For selecting the structural dimension K, the only relevant work to date is Li & Hsing (2010), where sequential χ2 tests are used to determine K for the method of Ferré & Yao (2003). How to extend such tests to sparse functional data, if feasible at all, is worthy of further exploration. It is also important to tune the truncation parameter sn that contributes to the variance-bias trade-off of the resulting estimator, although analytical guidance for this is not yet available.

3. Asymptotic properties

In this section we present asymptotic properties of the functional cumulative slicing kernel operator and the effective dimension reduction directions for sparse functional data. The numbers of measurements Ni and the observation times Tij are considered to be random, to reflect a sparse and irregular design. Specifically, we make the following assumption.

Assumption 4

The Ni are independent and identically distributed as a positive discrete random variable Nn, where E(Nn) < ∞, pr(Nn ≥ 2) > 0 and pr(NnMn) = 1 for some constant sequence Mn that is allowed to diverge, i.e., Mn → ∞ as n → ∞. Moreover, ({Tij, jJi}, {Uij, jJi}) are independent of Ni for Ji ⊆ {1, … , Ni}.

Writing Ti = (Ti1, … , TiNi)T and Ui = (Ui1, … , UiNi)T, the data quadruplets Zi = {Ti, Ui, Yi, Ni} are thus independent and identically distributed. Extremely sparse designs are also covered, with only a few measurements for each subject. Other regularity conditions are standard and listed in the Appendix, including assumptions on the smoothness of themean and covariance functions of X, the distributions of the observation times, and the bandwidths and kernel functions used in the smoothing steps. Write AH2=TTA2(s,t)dsdt for AL2(T×T).

Theorem 2

Under Assumptions 1, 4 and A1–A4 in the Appendix, we have

Λ^ΛH=Op(n12h112+h12),Σ^ΣH=Op(n12h21+h22).

The key result here is the L2 convergence of the estimated operator Λ^, in which we exploit the projections of nonparametric U-statistics together with a decomposition of m^(t,y~) to overcome the difficulty caused by the dependence among irregularly spaced measurements. The estimator Λ^ is obtained by averaging the smoothers m^(t,Yi) over Yi, which is crucial in order to achieve the univariate convergence rate for this bivariate estimator. The convergence of the covariance operator ∑ is presented for completeness, given in Theorem 2 of Yao & Müller (2010).

We are now ready to characterize the estimation of the central subspace SY∣X = span(β1, … , βK). Unlike the multivariate or finite-dimensional case, where the convergence of S^YX follows immediately from the convergence of Σ^ and Λ^ given a bounded ∑−1, we have to approximate ∑−1 with a sequence of truncated estimates Σ^sn1, which introduces additional variability and bias inherent in a functional inverse problem. Since we specifically regarded the index functions {β1, … , βK} as the eigenfunctions associated with the K largest eigenvalues of ∑−1Λ, their estimates are thus equivalent to S^YX. For some constant C > 0, we require the eigenvalues of ∑ to satisfy the following condition:

Assumption 5

αj > αj+1 > 0, E(ξj4)Cαj2, and αjαj+1C−1 ja−1 for j ≥ 1.

This condition on the decaying speed of the eigenvalues αj prevents the spacings between consecutive eigenvalues from being too small, and also implies that αjCja with a > 1 given the boundedness of ∑. Expressing the index functions as βk=j=1bkjϕj (k = 1, … , K), we impose a decaying structure on the generalized Fourier coefficients bkj = ⟨βk, ϕj⟩:

Assumption 6

bkj∣ ≤ Cjb for j ≥ 1 and k = 1, … , K, where b > 1/2.

In order to accurately estimate the eigenfunctions ϕj from Σ^, one requires jsup{:αα+1>2Σ^ΣH}, i.e., that the distance to αj from the nearest eigenvalue does not fall below 2Σ^ΣH (Hall & Hosseini-Nasab, 2006); this implicitly places an upper bound on the truncation parameter sn. Given Assumption 5 and Theorem 2, we provide a sufficient condition on sn. Here we write c1nc2n when c1n = O(c2n) and c2n = O(c1n).

Assumption 7

As n → ∞, sna+1(n12h21+h22)0; moreover, if h2n−1/6, sn = o{n1/(3a+3)}.

Theorem 3

Under Assumptions 1–7 and A1–A4 in the Appendix, for all k = 1, … , K,

β^k,snβk=Op{sn3a2+1(n12h112+h12)+sn2a+32(n12h21+h22)+snb+12}. (6)

This result associates the convergence of β^k,sn with the truncation parameter sn and the decay rates of αj and bkj, indicating a bias-variance trade-off with respect to sn. One can view sn as a tuning parameter that is allowed to diverge slowly and which controls the resolution of the covariance estimation. Specifically, the first two terms on the right-hand side of (6) are attributed to the variability of estimating Σsn1Λ with Σ^sn1Λ^, and the last term corresponds to the approximation bias of Σsn1Λ. The first term of the variance is due to Σ^sn1Λ^Σ^sn12Σ^sn1ΛΣ^sn12H and becomes increasingly unstable with a larger truncation. The second part of the variance is due to (Σsn1Σ^sn1)ΛΣsn12H, and the approximation bias is determined by the smoothness of βk; for instance, a smoother βk with a larger b leads to a smaller bias.

4. Simulations

In this section we illustrate the performance of the proposed functional cumulative slicing method in terms of estimation and prediction. Although our proposal is link-free for estimating index functions βk, a general index model (1) may lead to model predictions with high variability, especially given the relatively small sample sizes frequently encountered in functional data analysis. Thus we follow Chen et al. (2011) in assuming an additive structure for the link function g in (1), i.e., Y=β0+k=1Kgk(βk,X)+ϵ. In each Monte Carlo run, a sample of n = 200 functional trajectories is generated from the process Xi(t)=j=150ξijϕj(t), where ϕj(t) = sin(πtj/5)/ √5 for j even and ϕj(t)=cos(πtj/5)/ √5 for j odd, the functional principal component scores ξij are independent and identically distributed as N(0, j−1·5), and T=[0,10]. For the setting of sparsely observed functional data, the number of observations per subject, Ni, is chosen uniformly from {5, … , 10}, the observational times Tij are independent and identically distributed as Un[0, 10], and the measurement error εij is independent and identically distributed as N(0, 0·1). The effective dimension reduction directions are generated by β1(t)=j=150bjϕj(t), where bj = 1 for j = 1, 2, 3 and bj = 4(j − 2)−3 for j = 4, … , 50, and β2(t)=0·31/2(t/5 − 1), which cannot be represented with finite Fourier terms. The following single- and multiple-index models are considered:

Model I:Y=sin(πβ1,X4)+ϵ,Model II:Y=arctan(πβ1,X2)+ϵ,Model III:Y=sin(πβ1,X3)+exp(β2,X3)+ϵ,Model IV:Y=arctan(πβ1,X)+sin(πβ2,X6)2+ϵ,

where the regression error ϵ is independent and identically distributed as N(0, 1) for all models.

We compare our method with that of Jiang et al. (2014) for sparse functional data in terms of estimation and prediction. Denote the true structural dimension by K0. Due to the nonidentifiability of the βk, we examine the projection operator of the effective dimension space, i.e., P=k=1K0βkβk and P^K,sn=k=1Kβ^K,snβ^K,sn. To assess the estimation of the effective dimension reduction space, we calculate P^K,snPH as the estimation error. To assess model prediction, we estimate the link functions gk nonparametrically by fitting a generalized additive model Yi=β0+k=1Kgk(Zik)+ϵi (Hastie & Tibshirani, 1990), where Zik=β^k,sn,X~i with X~i being the best linear unbiased predictor of Xi (Yao et al., 2005a). We generate a validation sample of size 500 in each Monte Carlo run and calculate the average of the relative prediction errors, 500−1 i=1500(Y^iYi)2σ2, over different values of (K, sn), where σ2 = 1 and Y^i=β^0+k=1Kg^k(Zik) with Zik=β^k,sn,Xi, the Xi being the underlying trajectories in the testing sample. We report in Table 1 the average estimation and prediction errors, minimized over (K, sn), along with their standard errors over 100 Monte Carlo repetitions. For estimation and prediction, both methods selected (K, sn) = (1, 3) for the single-index models I and II, and selected (K, sn) = (2, 2) for the multiple-index models III and IV. The two approaches perform comparably in this sparse setting, which could be due to the inverse covariance estimation that dominates the overall performance. Our method takes one-third of the computation time of the method of Jiang et al. (2014) for this sparse design.

Table 1.

Estimation error and relative prediction error, multiplied by 100, obtained from 100 Monte Carlo repetitions (with standard errors in parentheses) for sparse functional data

Model Metric FCS IRLD Metric FCS IRLD
I Estimation
error
61·1 (1·1) 61·3 (1·1) Prediction
error
17·7 (0·6) 17·9 (0·5)
II 59·3 (1·0) 59·5 (1·0) 19·6 (0·6) 19·4 (0·5)
III 63·7 (0·8) 63·9 (0·9) 18·8 (0·5) 19·5 (0·4)
IV 63·8 (0·8) 63·9 (0·9) 45·2 (1·1) 45·4 (1·1)

FCS, functional cumulative slicing; IRLD, the method of Jiang et al. (2014), where (K, sn) is selected by minimizing the estimation and prediction errors.

We also present simulation results for dense functional data, where Ni = 50 and the Tij are sampled independently and identically from Un[0, 10]. With (K, sn) selected so as to minimize the estimation and prediction errors, we compare our proposal with the method of Jiang et al. (2014), functional sliced inverse regression (Ferré & Yao, 2003) using five or ten slices, and the functional index model of Chen et al. (2011). Table 2 indicates that our method slightly outperforms the method of Jiang et al. (2014), followed by the method of Chen et al. (2011), while functional sliced inverse regression (Ferré & Yao, 2003) is seen to be suboptimal. Our method takes only one-sixth of the time required by Jiang et al. (2014) for this setting.

Table 2.

Estimation error and relative prediction error, multiplied by 100, obtained from 100 Monte Carlo repetitions (with standard errors in parentheses) for dense functional data

Metric Model FCS IRLD FSIR5 FSIR10 FIND
Estimation
error
I 39·2 (1·6) 45·5 (1·5) 59·4 (2·1) 61·7 (2·2) 47·1 (1·6)
II 35·5 (1·4) 38·1 (1·3) 56·1 (1·8) 57·8 (1·9) 44·5 (1·5)
III 59·6 (0·8) 63·1 (0·8) 72·6 (1·1) 74·1 (1·3) 63·6 (0·9)
IV 57·2 (0·6) 59·0 (0·6) 69·3 (1·0) 68·9 (0·9) 61·0 (0·8)
Prediction
error
I 11·1 (0·6) 12·7 (0·5) 17·1 (0·7) 16·7 (0·6) 16·1 (1·1)
II 9·8 (0·5) 10·5 (0·4) 15·5 (0·7) 16·9 (1·0) 14·9 (0·8)
III 13·5 (0·5) 15·2 (0·5) 15·8 (0·6) 16·6 (0·5) 14·7 (0·6)
IV 19·9 (0·7) 21·9 (0·7) 31·1 (1·4) 32·2 (1·4) 24·2 (1·2)

FCS, functional cumulative slicing; IRLD, inverse regression for longitudinal data (Jiang et al., 2014); FSIR5, functional sliced inverse regression (Ferré & Yao, 2003) with five slices; FSIR10, functional sliced inverse regression (Ferré & Yao, 2003) with ten slices; FIND, functional index model (Chen et al., 2011).

5. Data application

In this application, we study the relationship between the winning bid price of 156 Palm M515 PDA devices auctioned on eBay between March and May of 2003 and the bidding history over the seven-day period of each auction. Each observation from a bidding history represents a live bid, the actual price a winning bidder would pay for the device, known as the willingness-to-pay price. Further details on the bidding mechanism can be found in Liu & Müller (2009). We adopt the view that the bidding histories are independent and identically distributed realizations of a smooth underlying price process. Due to the nature of online auctions, the jth bid of the ith auction usually arrives irregularly at time Tij, and the number of bids Ni can vary widely, from nine to 52 for this dataset. As is usual in modelling prices, we take the log-transform of the bid prices. Figure 1 shows a sample of nine randomly selected bid histories over the seven-day period of the respective auction. Typically, the bid histories are sparse until the final hours of each auction, when bid sniping occurs. At this point, snipers place their bids at the last possible moment to try to deny competing bidders the chance of placing a higher bid.

Fig. 1.

Fig. 1

Observed bid prices over the seven-day auction period of nine randomly selected auctions, after log-transform.

Since our main interest is in the predictive power of price histories up to time T for the winning bid prices, we consider the regression of the winning price on the history trajectory X(t) (t ∈ [0, T]), and set T = 4·5, 4·6, 4·7, … , 6·8 days. For each analysis on the domain [0, T], we select the optimal structural dimension K and the truncation parameter sn by minimizing the average five-fold crossvalidated prediction error over 20 random partitions. Figure 2(a) shows the minimized average crossvalidated prediction errors, compared with those obtained using the method of Jiang et al. (2014). With the increasing prediction power as the bidding histories encompass more data, the proposed method appears to yield more favourable prediction across different time domains.

Fig. 2.

Fig. 2

(a) Average five-fold crossvalidated prediction errors for functional cumulative slicing (circles) and the method of Jiang et al. (2014) (diamonds) over 20 random partitions across different time domains [0, T], for sparse eBay auction data. (b) Estimated model components for eBay auction data using functional cumulative slicing with K = 2 and sn = 2; the upper panels show the estimated index functions, i.e., the effective dimension reduction directions, and the lower panels show the additive link functions.

As an illustration, we present the analysis for T = 6. The estimated model components using the proposed method are shown in Fig. 2(b), with the parameters chosen as K = 2 and sn = 2. The first index function assigns contrasting weights to bids made before and after the first day, indicating that some bidders tend to underbid at the beginning only to quickly overbid relative to the mean. The second index represents a cautious type of bidding behaviour, entering at a lower price and slowly increasing towards the average level. These features contribute most towards the prediction of the winning bid prices. Also seen are the nonlinear patterns in the estimated additive link functions. Using these estimated model components, we display in Fig. 3(a) the additive surface β^0+g^1(β^1,X)+g^2(β^2,X). We also fit an unstructured index model g(⟨β1, X⟩, ⟨β2, X⟩), where g is nonparametrically estimated using a bivariate local linear smoother; this is shown in Fig. 3(b), and is seen to agree reasonably well with the additive regression surface.

Fig. 3.

Fig. 3

Fitted regression surfaces for the eBay data: (a) additive; (b) unstructured.

Acknowledgement

We thank two reviewers, an associate editor, and the editor for their helpful comments. This research was partially supported by the U.S. National Institutes of Health and National Science Foundation, and the Natural Sciences and Engineering Research Council of Canada.

Appendix

Regularity conditions and auxiliary lemmas

Without loss of generality, we assume that the known weight function is w(·) = 1. Write T=[a,b] and Tδ=[aδ,b+δ] for some δ > 0; denote a single observation time by T and a pair of observation times by (T1, T2)T, with densities f(t) and f2(s, t), respectively. Recall the unconditional mean function m(t, y) = E{X(t)1(Yy)}. The regularity conditions for the underlying moment functions and design densities are as follows, where 1 and 2 are nonnegative integers. We assume that:

Assumption A1

2∑/(∂s1t2) is continuous on Tδ×Tδ for 1 + 2 = 2, and ∂2m/∂t2 is bounded and continuous with respect to tT for all yR.

Assumption A2

f1(1)(t) is continuous on Tδ with f1(t) > 0, and ∂f2/(∂s1t2) f2 is continuous on Tδ×Tδ for 1 + 2 = 1 with f2(s, t) > 0.

Assumption A1 can be guaranteed by a twice-differentiable process, and Assumption A2 is standard and implies the boundedness and Lipschitz continuity of f. Recall the bandwidths h1 and h2 used in the smoothing steps for m^ in (3) and Σ^ in (5), respectively; we assume that:

Assumption A3

h1 → 0, h2 → 0, nh13logn, and nh22.

We say that a bivariate kernel function K2 is of order (ν, ), where ν is a multi-index ν = (ν1, ν2)T, if

u1v2K2(u,v)dudv{=0,01+2<,1ν1,2ν2,=(1)νν1!ν2!,1=ν1,2=ν2,0,1+2=,}

where ∣ν∣ = ν1 + ν2 < . The univariate kernel K is said to be of order (ν, ) for a univariate ν = ν1 if this definition holds with 2 = 0 on the right-hand side, integrating only over the argument u on the left-hand side. The following standard conditions on the kernel densities are required.

Assumption A4

The kernel functions K1 and K2 are nonnegative with compact supports, bounded, and of order (0, 2) and {(0, 0)T, 2}, respectively.

Lemma A1 is a mean-squared version of Theorem 1 in Martins-Filho & Yao (2006), which asserts the asymptotic equivalence of a nonparametric V-statistic to the projection of the corresponding U-statistic. Lemma A2 is a restatement of Lemma 1(b) of Martins-Filho & Yao (2007) adapted to sparse functional data.

Lemma A1

Let {Zi}i=1n be a sequence of independent and identically distributed random variables, and let un and vn be U- and V-statistics with kernel function ψn(Z1, … , Zk). In addition, let u^n=n1ki=1n{ψ1n(Zi)ϕn}+ϕn, where ψ1n(Zi)= E{ψn(Zi1, … , Zik) ∣ Zi} for i ∈ {i1, … , ik} and ϕn = E{ψn(Z1, … , Zk)}. If E{ψn2(Z1,,Zk)}=o(n), then nE{(vnu^n)2}=o(1).

Lemma A2

Given Assumptions 1–4 and A1–A4, let

sk(t)=i=1nj=1Mn1(Nij)nh1K1(Tijth1)(Tijth1)k.

Then suptTh11sk(t)E{sk(t)}=Op(1) for k = 0, 1, 2.

Proofs of the theorems

Proof of Theorem 1

This theorem is an analogue of Theorem 1 in Zhu et al. (2010); thus its proof is omitted.

Proof of Theorem 2

For brevity, we write Mn and Nn as M and N, respectively. Let

Sn(t)=i=1nj=1M1(Nij)nh1E(N)K1(Tijth1)(1(Tijt)h1(Tijt)h1{(Tijt)h1}2),S(t)=(fT(t)00fT(t)σK2),

where σK2=u2K(u)du. The local linear estimator of m(t,y~) with kernel K1 is

m^(t,y~)=(1,0)Sn1(t)(ΣiΣj1(Nij){nh1E(N)}1K1{(Tijt)h1}Uij1(Yiy~)ΣiΣj1(Nij){nh1E(N)}1K1{(Tijt)h1}{(Tijt)h1}Uij1(Yiy~)).

Let Uij(t,y~)=Uij1(Yiy~)m(t,y~)m(1)(t,y~)(Tijt),Wn(z,t)=(1,0)Sn1(t)(1,z)TK1(z). Then

m^(t,y~)m(t,y~)=1ni=1nj=1M1(Nij)h1E(N)Wn(Tijth1,t)Uij(t,y~).

Denote a point between Tij and t by tij; by Taylor expansion, Uij(y~)=Uij1(Yiy~)m(Tij,y~)+m(2)(tij,y~)(Tijt)22. Finally, let eij(y~)=Uij1(Yiy~)m(Tij,y~). Then

m^(t,y~)m(t,y~)=1ni=1nj=1M1(Nij)h1E(N)fT(t)K1(Tijth1)eij(y~)+12ni=1nj=1Mh11(Nij)E(N)fT(t)K1(Tijth1)(Tijth1)2m(2)(tij,y~)+An(t,y~),

where

An(t,y~)=m^(t,y~)m(t,y~){nh1E(N)fT(t)}1ij1(Nij)K1{(Tijt)h1}Uij(t,y~).

This allows us to write Λ^(s,t)Λ(s,t)=I1n(s,t)+I2n(s,t)+I3n(s,t), where

I1n(s,t)=1nk=1n[m(s,Yk){m^(t,Yk)m(t,Yk)}+m(t,Yk){m^(s,Yk)m(s,Yk)}],I2n(s,t)=1nk=1n{m^(s,Yk)m(s,Yk)}{m^(t,Yk)m(t,Yk)},I3n(s,t)=1nk=1nm(s,Yk)m(t,Yk)Λ(s,t),

which implies, by the Cauchy–Schwarz inequality, that Λ~ΛH2=Op(I1nH2+I2nH2+I3nH2). In the rest of the proofs, we drop the subscript H and the dummy variable in integrals for brevity. Recall that we defined Zi as the underlying data quadruplet (Ti, Ui, Yi, Ni). Further, let ∑(p) hi1,…,ip denote the sum of hi1,…,ip over the permutations of i1, … , ip. Finally, by Assumptions A1, A2 and A4, write 0<BTLf(t)BTU< for the lower and upper bounds of the density function of T, ∣K1(x)∣ ≤ BK < ∞ for the bound on the kernel function K1, and ∣∂2m/∂t2∣ ≤ B2m < ∞ for the bound on the second partial derivative of m(t,y~) with respect to t.

(a) We further decompose I1n(s, t) into I1n(s, t) = I11n(s, t) + I12n(s, t) + I13n(s, t), where

I11n(s,t)=1n2k=1ni=1nj=1M{1(Nij)h1E(N)fT(t)K1(Tijth1)eij(Yk)m(s,Yk)}+{1(Nij)h1E(N)fT(s)K1(Tijsh1)eij(Yk)m(t,Yk)}I12n(s,t)=12n2k=1ni=1nj=1M{h11(Nij)E(N)fT(t)K1(Tijth1)(Tijth1)2m(2)(tij,Yk)m(s,Yk)}+{h11(Nij)E(N)fT(s)K1(Tijsh1)(Tijsh1)2m(2)(tij,Yk)m(t,Yk)}I13n(s,t)=1nk=1n{m(s,Yk)An(t,Yk)+m(t,Yk)An(s,Yk)},

which we analyse individually below.

We first show that EI11n2=O(n1h11). We write I11n(s, t) as

I11n(s,t)=12n2k=1ni=1n(2){hik(s,t)+hik(t,s)}=12n2k=1ni=1nψn(Zi,Zk;s,t)=12vn(s,t),

where vn(s, t) is a V-statistic with symmetric kernel ψn(Zi, Zk; s, t) and

hik(s,t)=j=1M1(Nij)h1E(N)fT(t)K1(Tijth1)eij(Yk)m(s,Yk).

Since E{eij(Yk) ∣ Tij, Yk} = 0, it is easy to show that E{hik(s, t)} = E{hik(t, s)} = E{hki(s, t)} = E{hki(t, s)} = 0. Thus θn(s, t) = E{ψn(Zi, Zk; s, t)} = 0. Additionally,

ψ1n(Zi;s,t)=E{ψn(Zi,Zk;s,t)Zi}=j=1M1(Nij)h1E(N)fT(t)K1(Tijth1)E{eij(Yk)m(s,Yk)Zi}+j=1M1(Nij)h1E(N)fT(s)K1(Tijsh1)E{eij(Yk)m(t,Yk)Zi}.

If E{ψn2(Zi,Zk;s,t)}=o(n), Lemma A1 gives nE{vn(s,t)u~n(s,t)}2=o(1), where u~n(s,t)=2n1i=1nψ1n(Zi;s,t) is the projection of the corresponding U-statistic. Since the projection of a U-statistic is a sum of independent and identically distributed random variables ψ1n(Zi; s, t), EI11n2 ≤ 2n−1 ∫∫ var[E{hik(s, t) ∣ Zi}] + 2n−1 ∫∫ var[E{hik(t, s) ∣ Zi}] + o(n−1), where

2nvar[E{hik(s,t)Zi}]dsdtj=1M2P(Nij)nh12E(N)fT2(t)E[K12(Tijth1)E2{eij(Yk)m(s,Yk)Zi}]dsdt=j=1M2P(Nij)nh1E(N)fT2(t)K12(u)×EXi,Yi,εi[EYk2{eij(Yk)m(s,Yk)Tij=t+uh1}]fT(t+uh1)dudsdtj=1M2K12P(Nij)nh1E(N)fT1(t)EXi,Yi,εi[EYk2{eij(Yk)m(s,Yk)Tij=t}]8K12nh1BTLE(N)EX4+4K12σ2nh1BTLE(N)EX2=O(1nh1),

where the first line follows from the Cauchy–Schwarz inequality, the second line is obtained by letting u=h11(Tijt) and observing that Tij is independent of Xi, Yi and εi, and the third line follows from a variant of the dominated convergence theorem (Prakasa Rao, 1983, p. 35) that allows us to derive rates of convergence for nonparametric regression estimators. Thus EI11n2=O(n1h11), provided that E{ψn2(Zi,Zk;s,t)}=o(n) for all i and k, which we will show below. For ik,

E{ψn2(Zi,Zk;s,t)}=2E{hik2(s,t)}+2E{hik2(t,s)}+4E{hik(s,t)hik(t,s)}+4E{hik(s,t)hki(s,t)}+4E{hik(s,t)hki(t,s)}.

Observe that

1nE{hik2(s,t)}=j=lMl=1MP{Nimax(j,l)}E2(N)fT2(t)×E{(nh12)1K1(Tijth1)K1(Tilth1)eij(Yk)eil(Yk)m2(s,Yk)}.

For j = l, applying the dominated convergence theorem to the expectation on the right-hand side gives n1h11K12fT(t)E{eij2(Yk)m2(s,Yk)Tij=t}, and hence n1E{hik2(s,t)}=o(1) by Assumption A3. For jl, a similar argument gives n−1 fT2(t)E{eij(Yk)eil(Yk)m2(s,Yk)Tij=Til=t}. The next two terms, E{hik2(t,s)} and E{hik(s, t)hik(t, s)}, can be handled similarly, as well as E{hik(s, t)hki(s, t)} = o(n) and the case of i = k. Thus E{ψn2(Zi,Zk;s,t)}=o(n).

Using similar derivations, one can show that EI12n2=O(h14)+o(n1).

We next show that I13n2=Op(n1h1+h16). Following Lemma 2 of Martins-Filho & Yao (2007),

An(t,Yk)=j=1Mi=1n1(Nij)nh1E(N){Wn(Tijth1,t)fT1(t)K1(Tijth1)}Uij(t,Yk)h11[(1,0){Sn1(t)S1(t)}2(1,0)T]12×{ji1(Nij)nE(N)K1(Tijth1)Uij(t,Yk)}+{ji1(Nij)nE(N)K1(Tijth1)(Tijth1)Uij(t,Yk)}=h11[(1,0){Sn1(t)S1(t)}2(1,0)T]12Rn(t,Yk).

Lemma A2 gives suptTh11[(1,0){Sn1(t)S1(t)}2(1,0)T]12=Op(1). Next, Rn(t, Yk) ≤ ∣Rn1(t, Yk)∣ + ∣Rn2(t, Yk)∣ + ∣Rn3(t, Yk)∣ + ∣Rn4(t, Yk)∣, where

Rn1(t,Yk)=j=1Mi=1n1(Nij)nE(N)K1(Tijth1)eij(Yk),Rn2(t,Yk)=j=1Mi=1nh121(Nij)2nE(N)K1(Tijth1)(Tijth1)2m(2)(tij,Yk),Rn3(t,Yk)=j=1Mi=1n1(Nij)nE(N)K1(Tijth1)(Tijth1)eij(Yk),Rn4(t,Yk)=j=1Mi=1nh121(Nij)2nE(N)K1(Tijth1)(Tijth1)3m(2)(tij,Yk).

Thus n−1k m(s, Yk) Rn1(t, Yk) = h1fT(t)I11n(s, t) leads to ‖h1 fT I11n2 = Op(n−1h1), and n−1k m(s, Yk)Rn2(t, Yk) = h1 fT(t)I12n(s, t) leads to h1fTI12n2=Op(h16). It follows similarly that the third and fourth terms are Op(n−1h1) and Op(h16), respectively. Hence, I13n2=Op(n1h1+h16). Combining the previous results gives I1n2=Op{(nh1)1+h14}.

(b) These terms are of higher order and are omitted for brevity.

(c) By the law of large numbers, n1i=1nm(,Yi)m(,Yi)Λ2=Op(n1).

Combining the above results leads to Λ^Λ2=Op(n1h11+h14).

Proof of Theorem 3

To facilitate the theoretical derivation, for each k = 1, … , K let ηk = ∑1/2 βk and η^k,sn=Σ^sn12β^k,sn be, respectively, the normalized eigenvectors of the equations ∑−1 Λ ∑−1/2ηk = λkβk and Σ^sn1Λ^Σ^sn12η^k,sn=λ^k,snβ^k,sn. Then

β^k,snβkλ^k,sn1Σ^sn1Λ^Σ^sn12λk1Σ1ΛΣ12+λk1Σ1ΛΣ12η^k,snηkλ^k,sn1Σ^sn1Λ^Σ^sn12Σ1ΛΣ12+Σ1ΛΣ12(λ^k,sn1λk1+λk1η^k,snηk),

using the fact that λ^k,sn1λk1+λ^k,sn1λk1. Applying standard theory for self-adjoint compact operators (Bosq, 2000) gives

λ^k,snλkΣ^sn12Λ^Σ^sn12Σ12ΛΣ12,η^k,snηkCΣ^sn12Λ^Σ^sn12Σ12ΛΣ12(k=1,,K),

where C > 0 is a generic positive constant. Thus β^k,snβk2=Op(I1n+I2n), where

I1n=Σ^sn1Λ^Σ^sn12Σ1ΛΣ122,I2n=Σ^sn12Λ^Σ^sn12Σ12ΛΣ122.

It suffices to show that I1n=Op{sn3a+2(n12h112+h12)+sn(4a+3)(n12h21+h22)+sn2b+1}. The calculations for I2n are similar and yield that I2n = op(I1n).

Observe that I1n ≤ 3I11n + 3I12n + 3I13n, where I11n=Σsn1ΛΣsn12Σ1ΛΣ122,I12n=Σ^sn1ΛΣ^sn12Σsn1ΛΣsn122 and I13n=Σ^sn1Λ^Σ^sn12Σ^sn1ΛΣ^sn122. Recall that Πsn=j=1snϕjϕj is the orthogonal projector onto the eigenspace associated with the sn largest eigenvalues of ∑. Let I denote the identity operator and Πsn=IΠsn the operator perpendicular to πsn, i.e., Πsn is the orthogonal projector onto the eigenspace associated with eigenvalues of ∑ that are less than αsn. Thus Σsn1ΛΣsn12=ΠsnΣ1ΛΣ12Πsn allows us to write I11nΠsnΣ1ΛΣ12+Σ1ΛΣ12Πsn. Since ∑−1Λ ∑−1/2ηk = λkβk,

ΠsnΣ1ΛΣ122k=1Kλk2i>snj=1bkjϕi,ϕjϕi2k=1Kλk2i>snbki2Ck=1Kλk2i>sni2b=O(sn2b+1);

similarly, Σ1ΛΣ12Πsn2=O(sn2b+1).

We decompose I12n as I12n ≤ 3I121n + 3I122n + 3I123n, where I121n=(Σsn1Σ^sn1)ΛΣsn122,I122n=(Σsn1Λ(Σsn12Σ^sn12)2 and I123n=(Σsn1Σ^sn1)Λ(Σsn12Σ^sn12)2. Note that I121n ≤ 6‖Λ∑−1/2πsn2(I1211n + I1212n), where

I1211n=j=1sn(αj1α^j1)ϕ^jϕ^j2,I1212n=j=1snαj1(ϕ^jϕ^jϕjϕj)2.

Under Assumption 7, for all 1 ≤ jsn, α^jαjΣ^Σ21(αjαj+1) implies that α^j21(αj+αj+1)C1ja, i.e., α^j1Cja for some C > 0. Thus

I1211nj=1sn(α^jαj)2(αjα^j)2CΣ^Σ2j=1snj4a=Op{sn4a+1(n1h22+h24)}.

For I1212n, using the fact that ϕ^jϕj22δj1Σ^Σ (Bosq, 2000), where δ1 = α1α2 and δj = min2≤≤j (α−1α, αα+1) for j > 1, we have that δj1ja+1 and

I1212n2j=1snαj2ϕ^jϕj2CΣ^Σ2j=1snj4a+2=Op{sn4a+3(n1h22+h24)}.

Using Λ∑−1/2ηk = λkβk, we obtain ΛΣ12Πsn2k=1Kλk2i=1snαij=1bkjϕi,ϕjϕi2k=1Kλk2i=1snαi2bki2<. Thus I121n=Op{sn4a+3(n1h22+h24)}. Using decompositions similar to the one for I121n, both I122n and I123n can be shown to be op{sn4a+3(n1h22+h24)}. This leads to I12n=Op{sn4a+3(n1h22+h24)}.

Note that I13nΣ^sn12Λ^Λ2Σ^sn122, where Σ^sn12j=1snα^j2Cj=1snj2a=Op(sn2a+1) and, similarly, Σ^sn122=Op(sna+1). From Theorem 2, we have I13n=Op{sn3a+2(n1h11+h14)}. Combining the above results leads to (6).

REFERENCES

  1. Bosq D. Linear Processes in Function Spaces: Theory and Applications. Springer; New York: 2000. [Google Scholar]
  2. Cai TT, Hall P. Prediction in functional linear regression. Ann. Statist. 2006;34:2159–79. [Google Scholar]
  3. Cambanis S, Huang S, Simons G. On the theory of elliptically contoured distributions. J. Mult. Anal. 1981;11:368–85. [Google Scholar]
  4. Cardot H, Ferraty F, Sarda P. Functional linear model. Statist. Prob. Lett. 1999;45:11–22. [Google Scholar]
  5. Chen D, Hall P, MüLler H-G. Single and multiple index functional regression models with nonparametric link. Ann. Statist. 2011;39:1720–47. [Google Scholar]
  6. Chiaromonte F, Cook DR, Li B. Sufficient dimension reduction in regressions with categorical predictors. Ann. Statist. 2002;30:475–97. [Google Scholar]
  7. Cook DR. Regression Graphics: Ideas for Studying Regressions through Graphics. Wiley; New York: 1998. [Google Scholar]
  8. Cook DR, Weisberg S. Comment on “Sliced inverse regression for dimension reduction”. J. Am. Statist. Assoc. 1991;86:328–32. [Google Scholar]
  9. Cook DR, Forzani L, Yao A-F. Necessary and sufficient conditions for consistency of a method for smoothed functional inverse regression. Statist. Sinica. 2010;20:235–8. [Google Scholar]
  10. Duan N, Li K-C. Slicing regression: A link-free regression method. Ann. Statist. 1991;19:505–30. [Google Scholar]
  11. Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. Chapman & Hall; London: 1996. [Google Scholar]
  12. Ferré L, Yao A-F. Functional sliced inverse regression analysis. Statistics. 2003;37:475–88. [Google Scholar]
  13. Ferré L, Yao A-F. Smoothed functional inverse regression. Statist. Sinica. 2005;15:665–83. [Google Scholar]
  14. Hall P, Horowitz JL. Methodology and convergence rates for functional linear regression. Ann. Statist. 2007;35:70–91. [Google Scholar]
  15. Hall P, Hosseini-Nasab M. On properties of functional principal components analysis. J. R. Statist. Soc. B. 2006;68:109–26. [Google Scholar]
  16. Hall P, Müller H-G, Wang J-L. Properties of principal component methods for functional and longitudinal data analysis. Ann. Statist. 2006;34:1493–517. [Google Scholar]
  17. Hastie TJ, Tibshirani RJ. Generalized Additive Models. Chapman & Hall; London: 1990. [Google Scholar]
  18. He G, Müller H-G, Wang J-L. Functional canonical analysis for square integrable stochastic processes. J. Mult. Anal. 2003;85:54–77. [Google Scholar]
  19. James GM, Silverman BW. Functional adaptive model estimation. J. Am. Statist. Assoc. 2005;100:565–76. [Google Scholar]
  20. Jiang C-R, Yu W, Wang J-L. Inverse regression for longitudinal data. Ann. Statist. 2014;42:563–91. [Google Scholar]
  21. Li B, Wang S. On directional regression for dimension reduction. J. Am. Statist. Assoc. 2007;102:997–1008. [Google Scholar]
  22. Li K-C. Sliced inverse regression for dimension reduction. J. Am. Statist. Assoc. 1991;86:316–27. [Google Scholar]
  23. Li K-C. On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. J. Am. Statist. Assoc. 1992;87:1025–39. [Google Scholar]
  24. Li Y, Hsing T. Deciding the dimension of effective dimension reduction space for functional and high-dimensional data. Ann. Statist. 2010;38:3028–62. [Google Scholar]
  25. Lin X, Carroll RJ. Nonparametric function estimation for clustered data when the predictor ismeasured without/with error. J. Am. Statist. Assoc. 2000;95:520–34. [Google Scholar]
  26. Liu B, Müller H-G. Estimating derivatives for samples of sparsely observed functions, with application to on-line auction dynamics. J. Am. Statist. Assoc. 2009;104:704–14. [Google Scholar]
  27. Martins-Filho C, Yao F. A note on the use of V and U statistics in nonparametric models of regression. Ann. Inst. Statist. Math. 2006;58:389–406. [Google Scholar]
  28. Martins-Filho C, Yao F. Nonparametric frontier estimation via local linear regression. J. Economet. 2007;141:283–319. [Google Scholar]
  29. Müller H-G, Stadtmüller U. Generalized functional linear models. Ann. Statist. 2005;33:774–805. [Google Scholar]
  30. Prakasa Rao BLS. Nonparametric Functional Estimation. Academic Press; Orlando, Florida: 1983. [Google Scholar]
  31. Ramsay JO, Silverman BW. Functional Data Analysis. 2nd ed Springer; New York: 2005. [Google Scholar]
  32. Rice JA, Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves. J. R. Statist. Soc. B. 1991;53:233–43. [Google Scholar]
  33. Xia Y, Tong H, Li W, Zhu L-X. An adaptive estimation of dimension reduction space. J. R. Statist. Soc. B. 2002;64:363–410. [Google Scholar]
  34. Yao F, Müller H-G. Empirical dynamics for longitudinal data. Ann. Statist. 2010;38:3458–86. [Google Scholar]
  35. Yao F, Müller H-G, Wang J-L. Functional data analysis for sparse longitudinal data. J. Am. Statist. Assoc. 2005a;100:577–90. [Google Scholar]
  36. Yao F, Müller H-G, Wang J-L. Functional linear regression analysis for longitudinal data. Ann. Statist. 2005b;33:2873–903. [Google Scholar]
  37. Yuan M, Cai TT. A reproducing kernel Hilbert space approach to functional linear regression. Ann. Statist. 2010;38:3412–44. [Google Scholar]
  38. Zhu L-P, Zhu L-X, Feng Z-H. Dimension reduction in regressions through cumulative slicing estimation. J. Am. Statist. Assoc. 2010;105:1455–66. [Google Scholar]

RESOURCES