Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Apr 1.
Published in final edited form as: J Multivar Anal. 2011 Apr 1;102(4):741–767. doi: 10.1016/j.jmva.2010.12.001

Some theoretical properties of Silverman's method for Smoothed functional principal component analysis

Xin Qi 1,*, Hongyu Zhao 1
PMCID: PMC3079282  NIHMSID: NIHMS259339  PMID: 21516205

Abstract

Principal component analysis (PCA) is one of the key techniques in functional data analysis. One important feature of functional PCA is that there is a need for smoothing or regularizing of the estimated principal component curves. Silverman's method for smoothed functional principal component analysis is an important approach in situation where the sample curves are fully observed due to its theoretical and practical advantages. However, lack of knowledge about the theoretical properties of this method makes it difficult to generalize it to the situation where the sample curves are only observed at discrete time points. In this paper, we first establish the existence of the solutions of the successive optimization problems in this method. We then provide upper bounds for the bias parts of the estimation errors for both eigenvalues and eigenfunctions. We also prove functional central limit theorems for the variation parts of the estimation errors. As a corollary, we give the convergence rates of the estimations for eigenvalues and eigenfunctions, where these rates depend on both the sample size and the smoothing parameters. Under some conditions on the convergence rates of the smoothing parameters, we can prove the asymptotic normalities of the estimations.

Keywords: Functional PCA, smoothing methods, roughness penalty, convergence rates, functional central limit theorem, asymptotic normality

1. Introduction

Principal component analysis (PCA) is one of the key techniques in multivariate analysis and functional data analysis. An important difference between classical PCA and functional PCA is that there is a need for smoothing or regularizing of the estimated principal component curves in functional PCA (see Chapter 9 in Ramsay and Silverman [12]). Many methods have been proposed to estimate the smoothed functional principal components when the sample curves are fully observed. A general overview of these methods and an extensive list of references can be found in Ramsay and Silverman [12]. The reader can find in Ferraty and Vieu [6] more discussions on theoretical aspects and nonparametric methods for functional data analysis. Functional PCA has many important applications. For example, functional principal component regression (see for instance Cardot, Ferraty and Sarda [2]) is a direct application of functional principal coponents analysis.

The approach proposed in Silverman [15] is an important method for smoothing functional PCA (see Chapter 9 in Ramsay and Silverman [12]) due to its theoretical and practical advantages. First, the weak assumptions underlying this method make it applicable to data from many fields. Silverman [15] did not make any assumptions on the mean curves and sample curves. Hence, in addition to data with smooth random curves, this method can be applied to analyze data where the sample curves can be unsmooth or even discontinuous, such as those encountered in financial engineering, survival analysis and other fields. For covariance functions, Silverman [15] only assumed that they have series expansions by their eigenfunctions without imposing smoothing constraint. This is attractive because the covariance functions are continuous but unsmooth in many important models such as stochastic differential equation models in financial engineering and counting process models in survival analysis. Second, Silverman's method controls the smoothness of eigenfunction curves by directly imposing roughness penalties on these functions instead of on sample curves or covariance functions. Furthermore, this approach changes the eigenvalue and eigenfunction problems in the usual L2 space to problems in another Hilbert space, the Sobolev space (with a norm different from the usual norm in the Sobolev space). Therefore, many powerful tools from the theory of Hilbert space can be employed to study the properties of this method. Third, this approach incorporates the smoothing step into the step for computing eigenvalues and eigenfunctions. Therefore, this method is computationally efficient with the same computational load as the usual unsmoothed functional PCA. Fourth, the estimates produced by this method are invariant under scale transformations. As pointed out by Huang, Shen and Buja [8], the invariance property under scale transformations should be a guiding principle in introducing roughness penalties to functional PCA.

Despite all these advantages, lack of knowledge about the theoretical properties of this method makes it difficult to generalize it to the situations where the sample curves are only observed at discrete time points. Silverman [15] only proved consistency of the estimations as the sample size goes to infinity and the smoothing parameter goes to zero. Even the existence of the solutions to the successive optimization problems in this method is not established. It is not clear how the estimation errors depend on the sample size and the smoothing parameter. Asymptotic normalities of the estimations also need to be proved. In this paper, we aim to solve these open problems. In Section 2, we give the detailed backgroud, basic notations and our main assumptions. In Section 3, Silverman's method is introduced and the existence theorem for the successive optimization problems is proven. Our main results appear in Section 4. Section 5 contains detailed proofs of our theorems.

2. Notations and main assumptions

We introduce notations and definitions used throughout the paper. Let ℕ denote the collection of all the positive integers. We consider a finite time interval [a, b]. In this paper, we will mainly consider functions in the following two space, the L2 space

L2([a,b])={f:fis a measurable function on[a,b]andab|f(t)|2dt<},

and the Sobolev space

W22([a,b])={f:f,fare absolutely continuous on[a,b]andfL2([a,b])},

where f′ and f″ donate the first and second derivatives of f, respectively. For any f, gL2([a, b]), define the usual inner product

(f,g)=abf(t)g(t)dt,

with corresponding squared norm ∥f2 = (f, f). Given a smoothing parameter α > 0, for any f, gW22([a,b]), define

[f,g]=abf(t)g(t)dt

and the inner product

(f,g)α=(f,g)+α[f,g]

with corresponding squard norm fα2=(f,f)α. Note that is α = 0, we return to the L2([a, b]) space. For any bounded operator B from L2([a, b]) to L2([a, b]), define the norm

B=sup{Bf:fL2([a,b])andf1}. (2.1)

For any measurable function A(s, t) on [a, b] × [a, b], if

ababA2(s,t)dsdt<,

then fabA(s,t)f(t)dt defines a bounded operator from L2([a, b) to L2([a, b]). To simplify the notation, we just use A to denote this operator, that is

Af(s)=abA(s,t)f(t)dt,

and we have

A(ababA2(s,t)dsdt)12.

Let X(t), atb be a measurable stochastic process on [a, b]. Under Assumption 1 below, X(t) ∈ L2([a, b]) a.s.. Let {X1(t), X2(t), ⋯, Xn(t)} be i.i.d. sample curves from the distribution of X(t). Assume that EX(t) = ν(t). Define Γ to be the covariance function

Γ(s,t)=E[(X(s)ν(s))(X(t)ν(t))],s,t[a,b],

and Γ̂n to be the sample covariance function

Γ^n(s,t)=1nΣp=1n(Xp(s)X(s))(Xp(t)X(s)),s,t[a,b],

where is the sample mean curve

X(t)=1n(X1(t)++Xn(t)).

We will give our basic assumptions below. Silverman [15] made three assumptions in Section 5.2 in order to prove the consistency result. Our assumptions are stronger than those in Silverman [15].

Assumption 1

E[X4]=E[(ab|X(t)|2dt)2]<. (2.2)

Remark

  1. This assumption is stronger than the first assumption in Section 5.2 of Silverman [15]. Under condition (2.2), the central limit theorem for sample covariance function holds (see Section 2 in Dauxois, Pousse and Romain [3] and Chapter 10 in Ledoux and Talagrand [10]).

  2. Assumption 1 is satisfied by many stochastic processes used in applications. For example, if X(t) is a bounded process, it is obvious that (2.2) is true. Gaussian processes are an important class of stochastic processes which are widely used in statistics and other areas. Suppose that X(t) is a Gaussian process with mean zero. Then
    E[X4]=E[(ab|X(t)|2dt)2]=ababE[X(t)2X(s)2]dtds=abab[Γ(s,s)Γ(t,t)+2Γ(s,t)2]dsdtabab3Γ(s,s)Γ(t,t)dsdt=3[abΓ(t,t)dt]2.
    Hence if Γ(t, t) is integrahle in [a, b], which is satisfied by Gaussian processes commonly encountered in applications, (2.2) is true. Now let us consider the standard Brovmian motion, the most widely studied Gaussian process. For the standard Brovmian motion, Γ(t, t) = t, hence Assumption 1 is satisfied. It is well known that its sample paths are continuous and nowhere differentiable almost surely. For non-Gaussian processes, let us consider a Poisson process with rate 1 in [0,1], Its sample paths are step functions only taking integer values and hence discontinuous. It is easy to verify that Assumption 1 is satisfied by Poisson processes.
  3. Under condition (2.2), we have
    ababΓ(s,t)2dsdt=abab(E[(X(s)ν(s))(X(t)ν(t))])2dsdt=abab(EX(t)X(s)ν(s)ν(t))2dsdtabab2(EX(t)X(s))2+2ν(s)2ν(t)2dsdtabab2EX2(t)X2(s)+2ν(s)2ν(t)2dsdt4E[X4]<.
    Therefore, the operator Γ is a Hilbert-Schmidt operator, hence it is a compact operator (see Section XI.6 in Dunford and Schwartz [5] or Section 97 in Riesz and Sz.-Nagy [13]). It follows that the set of eigenvalues of this operator are bounded and at most countable with at most one limit point at 0. Because the covariance operator Γ is always nonnegative-definite, all the eigenvalues are nonnegative. Let λ1 ≥ λ2 ≥ ⋯ ≥ 0 be the collection of all eigenvalues and the corresponding eigenfunctions are γ1, γ2, ⋯. Every eigenfunction has been scaled to have L2-norm, 1. The set of all the eigenfunctions forms an orthonormal basis of L2([a, b]). Furthermore, we have decomposition
    Γ(s,t)=Σj=1λjγj(s)γj(t), (2.3)
    the series on the right hand side converges in the L2 sense. If Γ is a continuous function, the series on the right hand side absolutely and uniformly converges. Although Silverman [15] did not assume that Γ is square integrahle, he assumed the decomposition form of (2.3).
  4. We have
    Γγj=λjγj,j=1,2,.
  5. By (2.2), X(s) is square integrable a.s.. Hence, the sample covariance functions Γ̂n satisfies
    ababΓ^n(s,t)2dsdt<
    a.s.. Then we have that the eigenvalues λ̂1 ≥ λ̂2 ≥ ⋯ ≥ 0 since the operator Γ̂ is nonnegative-definite. The corresponding eigenfunctions γ̂j, j ∈ ℕ satisfying
    Γ^nγ^j=λ^jγ^j,j=1,2,.

Suppose that we are interested in estimating the first K eigenvalues and eigenfunctions of Γ.

Assumption 2

Any eigenvalue λj, 1 ≥ j ≤ K has multiplicity 1, so that

λ1>λ2>>λK>λK+1.

Remark

This assumption is just the third assumption in Section 5.2 of Silverman [15]. If an eigenvalue has multiplicity 1, then the corresponding eigen-function is uniquely determined up to a sign. If the multiplicity is larger than 1, the eigenfunctions can not he uniquely determined up to a sign.

Assumption 3

The eigenfunctions λj, 1 ≤ j ≤ K belong to W22([a,b])

Remark

  1. This assumption is the second assumption in Section 5.2 of Silverman [15] and is essential in our paper.

  2. If the covariance function Γ satisfies some smoothness conditions, then Assumption 3 is true. For example, suppose that Γ(s, t), Γ(s,t),Γ(s,t)s and 2Γ(s,t)s2 are all continuous on [a, b] × [a, b] (hence they are bounded and square integrable), one can easily verify that
    λkγk(s)=ab2Γ(s,t)s2γk(t)dt1kK.
    Hence, by Cauchy-Schwarz inequality and ∥γk∥ = 1, we have
    λk2ab(γk(s))2dsabab(2Γ(s,t)s2)2dsdt<1kK.
  3. There are many important random processes whose covariance matrices are not smooth, but the eigenfunctions corresponding to nonzero eigenvalues belong to W22([a,b]). The simplest examples are standard Brownian motion and Poisson process with rate 1 in time interval [0, 1]. Their covariance functions are the same and equal to min(s, t), 0 ≤ s, t ≤ 1 (see Page 89 in the book Glasserman [7]). The eigenvalues and eigenfunctions are
    λj=(2(2j1)π)2,γj=2sin((2j1)πt2),j=1,2,. (2.4)
    The next example is the famous Black-Scholes Model in finance. Let St denote the price of a stock at time t. Then St satisfies the following SDE,
    dSt=μStdt+σStdWt,
    where μ, is the instantaneous mean return, σ is the instantaneous return volatility and Wt is a Brownian motion. The covariance function of St is smooth except at the points on the diagonal line {(s, t) : s = t}. The same is true for the following example. Consider the counting processes model in survival analysis. Let Nt be the number of the occurrences of the event in [0, t]. Then Nt satisfies
    dNt=λ(t)dt+dMt,
    where λ(t) is a smooth intensity function and Mt is a martingale.

Silverman [15] introduced a “half-smoothing” operator which plays an important role in this paper. We give a strict definition of this operator here. We first define an unbounded operator L in L2([a, b]). The domain of L is 𝒟(L) = {fL2 ([a, b]) : f, f′ are absolutely continous and f″ ∈ L2 ([a, b])}, and for any f ∈ 𝒟(L),

Lf=f.

Then L is a closed but unbounded operator and 𝒟(L) is dense in L2([a, b]) (for the definition of closed operators, see Chapter VIII of Riesz and Sz.-Nagy [13] or Chapter 13 of Rudin [14]). Let L* be the adjoint operator of L. By the theorem in Section 118 of Riesz and Sz.-Nagy [13] or Theorem 13.13 in Rudin [14], (I + αL*L)−1 is a bounded, positive self-adjoint operator with norm less than or equal to 1, where α ≥ 0 is the smoothing parameter. Now it follows from Theorem 12.33 and 13.31 in Rudin [14] that (I + αL*L)−1 has a unique positive and self-adjoint square root Sα with norm less than or equal to 1 which is the “half-smoothing” operator in Silverman [15]. Therefore,

Sα2=(I+αLL)1, (2.5)

and by Theorem 13.11 (b) in Rudin [14], the inverse Sα1 exists and is self-adjoint because (I + αL*L)−1 is invertible.

3. Silverman's approach to smoothed functional PCA

In this section, we always assume that the independent sample curves

{X1(t),X2(t),,Xn(t):atb}

are entirely observed. We first consider the usual population functional principal components. The first population functional principal component is defined as the linear functional ℓ1(X) of X which maximizes

Var((X))

over all nonzero linear functionals ℓ in L2([a, b]) with the norm ∥ℓ∥ = 1. The second population functional principal component is defined as the linear functional ℓ2(X) of X which maximizes

Var((X))

over all linear functional with the norm ‖‖ = 1 and uncorrelated with 1(X). Similarly, we can define all the other population functional principal components, 3(X), …. Because X takes values in L2([a, b]) which is a real Hilbert space, by the Riesz representation theorem, for any bounded linear functional , there is a unique γL2([a, b]), such that for any fL2([a, b]),

(f)=(γ,f)and=γ.

Hence there exists γj, ∈ L2([a, b]), j ∈ ℕ, with ‖γj‖ = 1, such that the population functional principal components j(X) = (γj, X),j ∈ ℕ. γj is called the j-th principal component weight function or j-th principal component curve. Because

Var(j(X))=Var(γj,X)=(γj,Γγj),j,

γ1 is the solution of the following optimization problem,

maxγ=1(γ,Γγ)γ2. (3.1)

The maximum value of (3.1) is just the largest eigenvalue Ai of λ1 and γ1 is the corresponding eigenfunction (see Section 2, Chapter 3 in Weinberger [16]). γ2 is the solution of the optimization problem,

maxγ=1,(γ,γ1)=0(γ,Γγ)γ2. (3.2)

The maximum value of (3.2) is just the second eigenvalue λ2 of Γ and γ2 is the corresponding eigenfunction. Similarly, γj is the eigenfunction corresponding to the eigenvalue λj which is also the variance of the j-th principal component.

Because the covariance function Γ is usually unknown, we can not obtain the population principal component weight functions directly. Hence, people use the sample covariance function Γ̂n to estimate Γ and use the eigenvalues and eigenfunctions of Γ̂n to estimate the eigenvalues and eigenfunctions of Γ. We call them non-smooth estimators. However, the non-smooth principal component curves can show substantial variability (see Chapter 9 in Ramsay and Silverman [12]). There is a need for smoothing of the estimated principal component weight functions.

Silverman [15] (see also Chapter 9 in Ramsay and Silverman [12]) proposed a method of incorporating smoothing by replacing the usual L2 norm with a norm that takes the roughness of the functions into account. Let α be a nonnegative smoothing parameter. Define the estimators {(λ^j[α],γ^j[α]):j} of {(λj, γj) : j ∈ ℕ} to be the solutions of the following successive optimization problems. First, γ^1[α] is the solution of the optimization problem

maxγ=1(γ,Γ^nγ)(γ,γ)+α[γ,γ]=maxγ=1(γ,Γ^nγ)γα2. (3.3)

Let λ^1[α] be the maximum value of (3.3). For any k ∈ ℕ, if we have obtained {γ^j[α],j=1,2,,k1} and {λ^j[α],j=1,2,,k1}, γ^k[α] is the solution of the optimization problem

maxγ=1,(γ,γ^j[α])α=0,j=1,,k1(γ,Γ^nγ)γα2, (3.4)

and λ^k[α] is the maximum value of (3.4). Note that {(λ^j[α],γ^j[α]):j} depends on both the sample size n and the smoothing parameter α.

First of all, we need to show that the solutions {(λ^j[α],γ^j[α]):j} of the successive optimization problems (3.3) and (3.4) exist.

Theorem 3.1

Under Assumption 1, the solutions {(λ^j[α],γ^j[α]):j} of the successive optimization problems (3.3) and (3.4) exist for any α ≥ 0 almost surely. Moreover, we have, for any γW22([a,b]) and j ∈ ℕ,

(Γ^nγ^j[α],γ)=λ^j[α](γ^j[α],γ)α. (3.5)

Similarly, define {(λj[α],γj[α]):j} to be the solutions of the successive optimization problems (3.3) and (3.4) with Γ̂n replaced by Γ Similarly, we have the following equalities for Γ and {(λj[α],γj[α]):j}

(Γγj[α],γ)=λj[α](γj[α],γ)α,j,γW22([a,b]) (3.6)

Note that

γj[0]=γj,λj[0]=λj,γ^j[0]=γ^j,λj[0]=λ^j,j.

Theorem 1 in Silverman [15] gives the consistency of the estimators

{(λ^j[α],γ^j[α]):j}

as α → 0 and n → ∞.

4. Asymptotic theory

Fix a positive integer K. We will assume throughout this section that we want to estimates the first K principal component curves. For any 1 ≤ kK, define

Lk=max1jk[γj,γj].

Then under Assumption 3, Lk is finite and is a measure of roughness of the first k eigenfunctions of Γ. For standard Brownian motion and Poisson process with rate 1 (see remark (3) after Assumption 3),

Lk=((2k1)π2)2,k=1,2,.

For any 1 ≤ kK, we have decompositions

λ^k[α]λk=(λ^k[α]λk[α])+(λk[α]λk), (4.1)
γ^k[α]γk=(γ^k[α]γk[α])+(γk[α]γk). (4.2)

The last terms λk[α]λk,γk[α]γk on the right hand sides of both (4.1) and (4.2) are nonrandom. They are the “bias terms” due to the introduction of α. We will give the upper bounds for norms of these terms. The first terms on the right hand sides of both (4.1) and (4.2) are the “variation terms” due to the randomness of the sample curves. We will prove a functional central limit theorem for these terms. In order to avoid any confusion it should be pointed out that (4.1) and (4.2) are not the bias-variance decompositions in the strict sense because λk[α] and γk[α] are not the expectations of λ^k[α] and γ^k[α] respectively. Since it is hard to express or characterize the exact expectations of λ^k[α] and γ^k[α], the asymptotic properties of the usual bias and variation terms in the strict sense may not be easily studied. Heuristic calculations of the usual bias and variation terms in the strict sense were performed in Section 6 of Silverman [15].

Note that even if the multiplicity of λk is one, we can not uniquely determine γk because −γk is also an eigenfunction. In the following theorem, by “Given γk”, we mean that not only γk is an eigenfunction, but also the direction of γk is given.

Define

α0=min1kK{min{1+2k(λk1λk)2(k1)λkΓ12kLk2,λkλk+1(8k+16k)Lk2λk,(λk1λk){1+2Γλkλk+1}1242k(k1)Lk2λk}}. (4.3)

Theorem 4.1

Under Assumptions 1 – 3, for any 1 ≤ kK and 0 ≤ αα0,

0λkλk[α]2kLk2λkα(1+O(kLk2λkλkλk+1α+k(k1)Lk4λk2Γ(λk1λk)2(λkλk+1)α2)). (4.4)

Given γk, 1 ≤ kK, we can uniquely choose γk[α] for each α ∈ [0, α0] such that γk[α] is a continuous function of α and (γk[α],γk)>0 for all 0 ≤ αα0, and we have

γk[α]γka42kLk2λkλkλk+1+α4k(k1)Lk4(λkλk1λk)2{1+2Γλkλk+1}. (4.5)

Remark

  1. If K is fixed or hounded, we have
    0λkλk[a]2kLk2λkα+o(α),γk[α]γka42kLk2λkλkλk+1+o(α)
    Hence, the convergence rates for eigenvalues and eigenfunctions are different. Eigenvalues have faster convergence rates than eigenfunctions.
  2. As K → ∞, we have α0 →. If we choose α in such a way that 0 ≤ αα0 and the right hand sides of (4.4) and (4.5) converges to zero, then λk[α]λk and γk[α]γk for all 1 ≤ kK.

  3. The convergence rates for both eigenvalues and eigenfunctions depend on Lk. If the eigenfunctions are less smooth, that is, Lk is large, then the convergence is slow.

  4. (4.4) and (4.5) give the upper bounds. However, the lower bounds are 0 for any k ∈ ℕ. Here is a simple example. Without loss of generality, let k = 2. Suppose [a, b] = [0, 2π],
    Γ(s,t)=2πcos(s)cos(t)+12π+12πsin(s)sin(t)+1πΣm=2(12m)3cos(ms)cos(mt)+1πΣm=2(12m+1)3sin(ms)sin(mt).
    Note that the right hand side in the above equality converges both uniformly and in L2([0, 2π] × [0, 2π]) to a strictly positive definite covariance functions. Its first eigenvalue and eigenfunction are 2 and 1πcos(t), the second ones are 1 and 12π. It is interesting to note that the eigenfunctions of Γ are the same as the solutions of the successive optimization problems (3.3) and (3.4). The first maximum value of the successive optimization problems (3.3) and (3.4) is 21+α and the second one is still 1. That is, in this case, we have λ2[α]=λ2 and γ2[α]=γ2 for any α, hence the lower bounds are zeros.

Define C[0, α0] to be the normed space of all continuous real functions in [0, α0] equipped with norm sup0≤αα0. | · |. Let Π1≤jK Cℝ[0, α0] denote the product space of K copies of Cℝ[0, α0] Define CL2([a,b]) [0, α0] to be the normed space of all continuous functions in [0, α0] taking values in L2([a,b]) equipped with norm sup0≤αα0 ‖ · ‖. Similarly, we define Π1≤jK CL2([a,b])[0, α0].

For each 1 ≤ kK and each n, we will view n(γ^k[α]γk[α]) as a stochastic process with index α ∈ [0, α0] and values in L2[a, b] and view n(λ^k[α]λk[α]) as a stochastic process with index α ∈ [0, α0] and values in ℝ. However, in the following subset in the probability space,

Ω0={ω:there exists at least oneα[0,α0]such thatλ^1[α],,λ^K[α]are not mutually different}, (4.6)

γ^1[α],,γ^K[α] are not uniquely determined up to signs. We will show that Ω0 is measurable and its probability goes to zero as n → ∞ in the proof of the following theorem. Hence, how to define γ^1[α],,γ^K[α] in Ω0 does not affect our asymptotic results. In order to make the development of our theory easier, we will use the following definition

inΩ0,defineγ^k[α]=0,1kK. (4.7)

Theorem 4.2

Under Assumptions 1 — 3 and the definition (4.7), we can properly choose γ^k[α] in Ω0c to make the sequence

{n(γ^k[α]γk[α]),1kK,0αα0}n (4.8)

of stochastic processes is measurable and has sample paths in

k=1KCL2([a,b])[0,α0]

a.s. . Furthermore, the sequence converges in distribution to a Gaussian random, element with values in k=1KCL2([a,b])[0,α0] and mean zero. Similarly, the sequence

{n(λ^k[α]λk[α]),1kK,0αα0}n (4.9)

of stochastic processes has sample paths in k=1KC[0,α0] a.s. and converges in distribution to a Gaussian random, element with values in k=1KC[0,α0] and mean zero.

Remark

  1. Recall the definition of Guassian random elements in a separable Banach space. Suppose that X is a random element with values in a Banach space B with mean zero. Then X is a Guassian element if for any bounded linear functional f, f(X) is a Guassian random, variable. If X is a Guassian random, element, we can define its covariance operator Q. Q is a bounded operator from the dual space B′ to B such that for any f, g ∈ B′, g(Qf) = E [f(X)g(X)]. Note that the distribution of a Gaussian element with values in a Banach space and mean zero is determined by its covariance operator. For further properties of Guassian random elements in Banach spaces, see Ledoux and Talagrand [10].

  2. The covariance operators (4.8) and (4.9) can be characterized by the “half-smoothing” operator Sα defined in (2.5) and the limit distribution of n(Γ^nΓ). However, the characterization involves some technical definitions. The reader can find the characterization in the proof of this theorem.

  3. The measurabilities and a.s. continuities of the sample paths of the processes (4.8) and (4.9) are not obvious at all.

  4. The convergences of (4.8) and (4.9) are weak convergences of probability measures in spaces k=1KC[0,α0] and k=1KCL2([a,b])[0,α0], which are stronger than the convergences of only the marginal distributions of (4.8) and (4.9).

Now from Theorem 4.1 and Theorem 4.2, we have the following corollaries.

Corollary 4.1

Under Assumptions 1 – 3, for any 1 ≤ kK and 0 ≤ αα0

|λ^k[α]λk||λ^k[α]λk[α]|+2kLk2λkα+o(α),γ^k[α]γk|γ^k[α]γk[α]|+α42kLk2λkλkλk+1+o(α). (4.10)

where

sup0αα0|λ^k[α]λk[α]|=Op(1n),sup0αα0|γ^k[α]γk[α]|=Op(1n).

Remark

From, Corollary 4.1, it seems that smoothing (that is, α > 0) is unnecessary since when α = 0, we get the best order 1n. We clarify this problem by the following remarks.

  1. Both Silverman [15] and this paper consider the ideal situation where every sample curve is observed at all points in [a, b] without any noise or measurement error. Although in this situation the estimates are consistent when α = 0, smoothing is advantageous.
    • – First, because the “bias terms” and the “variation terms” are not the bias and the variation in the strict sense, they are correlated. Since the upper bounds on the right hand sides of (4.10) are the sums of the upper bounds for bias terms and variation terms, the upper bounds in (4.10) are actually for the cases in which bias terms and variation terms are positively correlated. They are the worst cases when we introduce smoothing. In some cases such as those in Section 6.3 of Silverman [15], the mean squared errors for some α > 0 are less than those for α = 0. For these cases, it is possible that bias terms and variation terms are negatively correlated and hence the estimate errors should be much less than the upper bounds in (4.10). Section 6.4 of Silverman [15] gave an optimal a with order O(1n) for estimates of eigenfunctions. By Corollary 4.1, if we choose the optimal a, we obtain the best asymptotic rates O(1n). Even for the worst cases, if we take α=O(1n), we can obtain the rate O(1n).
    • – Second, from a practical viewpoint, it is desirable that the estimates of principal component curves can keep main patterns of the true principal component curves. However, the sample curves of many stochastic process are nonsmooth or even discontinuous, such as examples in Remark (3) after Assumption 3. Hence, their sample covariance functions have many local variations and so do the eigenfunctions of those sample covariance functions. In these cases, the local variations can be removed by using an appropriate amount of smoothing, that is, choosing an appropriate positive α.
  2. In practice, people cannot observe the entire sample curves. The observations can only be made at discrete points often with noise or measurement error. The observation points could he dense or sparse. If the sample curves are smooth and the observation points are dense, we can obtain smoothed estiamte of each sample function and perform the usual functional PC A. This method cannot be applied to other situations. However, Silverman's method can be generalized to all these situations (see Qi and Zhao [11]). In our generalization, smoothing is essential and the smoothing parameters must be positive. The theoretical results in this paper has been applied to prove the consistency results in Qi and Zhao [11].

If α goes to 0 fast enough as n → ∞, we have the following asymptotic normalities.

Corollary 4.2

Under Assumptions 1 – 3, for any sequence {αn, n ≥ 1} with αn=op(1n), the joint distributions of

{n(λ^1[αn]λ1),n(λ^2[αn]λ2),,n(λ^K[αn]λK)}

converge to the same Gaussian distribution with mean zero. For any sequence {αn, n ≥ 1} with αn=op(1n), the joint distributions of

{n(γ^1[αn]γ1),n(γ^2[αn]γ2),,n(γ^K[αn]γK)}

converge to the same Gaussian distribution with mean zero.

Remark

Dauxois et al. [3] gave the asymptotic normalities of the eigenvalues and eigenfunctions of Γ̂n and characterized the covariance operators of the limit Gaussian random elements. Those results are special cases of Corollary 4.2 with all αn equal to zeros. Therefore, by Corollary 4.2, all the limit Guassian distributions in Corollary 4.2 are the same as those in Dauxois et al. [3].

5. Proofs

Proof of Theorem 3.1

By Remark (3) after Assumption 1, ∥Γ̂n∥ < ∞ a.s.. Fix a sample and α ≥ 0 such that ∥Γ̂n∥ < ∞. Consider the Hilbert space W22([a,b]) equipped with the inner product (·,·)α. For any f, gW22([a,b]), the functional (f, Γ̂ng) define a bilinear form in W22([a,b]) and

|(f,Γ^ng)|Γ^nfgΓ^nfαgα.

Hence, there is a unique bounded operator Ra in W22([a,b]), such that for any f, gW22([a,b]),

(f,Γ^ng)=(f,Rαg)α,

(see Section 84 in Riesz and Sz.-Nagy [13]). It is easy to see that Ra is symmetric and nonnegative-definite. We want to show that Ra is a compact operator (note that a compact operator is called completely continuous operator in Riesz and Sz.-Nagy [13]). By definition 4 in Section 85 of Riesz and Sz.-Nagy [13], we only need to show that for any bounded sequence {fmW22([a,b]),m}, one can select a subsequence {fmk} such that

(fmkfml,Rα(fmkfml))α=(fmkfml,Γ^n(fmkfml))0, (5.1)

as k, l → ∞. Because Γ̂n is a compact operator in L2([a,b]) (see Remark (2) after Assumption 1) and {fm} is also a bounded sequence in L2([a, b]), one can select a subsequence {fmk} such that {Γ̂nfmmk} converges, then (5.1) is true for {fmk}. Hence Ra is a compact operator. It has eigenvalues and eigenfunctions {(λ^j[α],γ^j[α]):j} with λ^1[α]λ^2[α]0. They are the solutions of the successive optimization problems (3.3) and (3.4) (see Chapter 3 of Weinberger [16]). Now for any γW22([a,b]) and any j ∈ ℕ, because

Rαγ^j[α]=λ^j[α]γ^j[α],

we have

(Γ^nγ^j[α],γ)=(Rαγ^j[α],γ)α=λ^j[α](γ^j[α],γ)α.

Proof of Theorem 4.1

The proof of the existence and uniqueness of the choices of the signs of γk[α], 1 ≤ kK making them continuous functions of α will be postponed to the proof of Theorem 4.2 because we need some technical lemmas in the proof of Theorem 4.2. We will assume that we can choose the signs of γk[α], 1 ≤ kK such that they are continuous function of α for all 0 ≤ α ≤ α0 and γk[0]=γk, 1 ≤ kK.

For any 1 ≤ k ≤ K, let Pk be the orthogonal projection operator in L2([a,b]) onto the space spanned by {γ1,… ,γk} and I be the identity operator in L2([a, b]). Then (I − Pk) is the orthogonal projection operator onto the closed subspace spanned by {γj,j ≥ (k + 1)}.

Lemma 1

For any k ∈ ℕ, and α1 ≥ α2 ≥ 0

λk[α1]λk[α2],λ^k[α1]λ^k[α2],k.

Proof. It follows Theorem 8.1 in Chapter 3 of Weinberger [16].

Lemma 2

For any 1 ≤ k ≤ K and α ≥ 0, we have

Pk1γk[α]2α2(λkλk1λk)2(k1)Lk2[γk[α],γk[α]]. (5.2)

Proof. For any j < k, by (3.6), we have

λj(γk[α],γj)=(γk[α],Γγj)=(Γγk[α],γj)=λk[α](γk[α],γj)α=λk[α]{(γk[α],γj)+α[γk[α],γj]}.

So

(λjλk[α])(γk[α],γj)=λk[α]α[γk[α],γj].

By Assumption 2 and Lemma 1, λj>λkλk[α]. Therefore,

(γk[α],γj)=λk[α]λjλk[α]α[γk[α],γj].

and we have

Pk1γk[α]2=Σj=1k1(γk[α],γj)2=Σj=1k1(λk[α]λjλk[α])2α2[γk[α],γj]2α2(λkλk1λk)2Σj=1k1[γk[α],γj]2α2(λkλk1λk)2Σj=1k1[γk[α],γk[α]][γj,γj]α2(λkλk1λk)2(k1)Lk2[γk[α],γk[α]],

where the last inequality in the second line follows from Cauchy-Schwarz in-equality.

Lemma 3

For any 1 ≤ k ≤ K and any

0α<1+4k(λk1λk)2(k1)λkΓ12kLk2,

(if k = 1, the right hand side is defined to be infinity), we have

[γk[α],γk[α]]kLk21αλk(λk1λk)2(k1)Lk2(1+αkLk2)Γ. (5.3)

Furthermore, if

0α1+2k(λk1λk)2(k1)λkΓ12kLk2,

(if k = 1, the right hand side is defined to he infinity), we have

[γk[α],γk[α]]2kLk2. (5.4)

For any α ≥ 0, we have

0λkλk[α]αkLk2λk. (5.5)

Hence, as α → 0, λk[α]λk.

Proof. Let span(γ1, … , γk) denote the linear subspace spanned by

{γ1,,γk}.

From Theorem 5.1 (Poincare's Principle) in Chapter 3 of Weinberger [16], we have

min0γspan(γ1,,γk)(γ,Γγ)γ2+α[γ,γ]λk[α]=(γk[α],Γγk[α])γk[α]2+α[γk[α],γk[α]]=(Pk1γk[α]+(IPk1)γk[α],Γ(Pk1γk[α]+(IPk1)γk[α]))γk[α]2+α[γk[α],γk[α]]=(Pk1γk[α],ΓPk1γk[α])+((IPk1)γk[α],Γ(IPk1)γk[α])γk[α]2+α[γk[α],γk[α]]ΓPk1γk[α]2+λkγk[α]2+α[γk[α],γk[α]], (5.6)

where the equality in the third line of (5.6) is true because that (I − Pk−1) is the orthogonal projection operator onto the closed subspace spanned by {γj,jk} which is orthogonal to span(γ1, … , γk-1), and both of them are invariant subspaces of Γ. The last inequality in (5.6) holds because the largest eigenvalue of Γ restricted to the closed subspace spanned by {γj,jk} is λk and the L2 norm of (IPk1)γk[α] is less than 1. On the other hand, we have

min0γspan(γ1,,γk)(γ,Γγ)γ2+α[γ,γ]=min0γspan(γ1,,γk)(γ,Γγ)γ2(1+α[γ,γ]γ2)min0γspan(γ1,,γk)(γ,Γγ)γ2(1+max0βspan(γ1,,γk)α[β,β]β2)=1(1+max0βspan(γ1,,γk)α[β,β]β2)min0γspan(γ1,,γk)(γ,Γγ)γ2=λk(1+max0βspan(γ1,,γk)α[β,β]β2)λk(1+αkLk2) (5.7)

The equality in the last line follows from the fact that the smallest eigenvalue of Γ in span(γ1, … , γk) is λk. The last inequality holds because that, for any β ∈ span(γ1, … , γk), let β=Σi=1kciγi, where c1, … , ck are some real numbers, then we have

[β,β]β2=[Σi=1kciγi,Σi=1kciγi]Σi=1kci2=Σi=1kci2[γi,γi]+Σjlcjcl[γj,γl]Σi=1kci2Σi=1kci2([γi,γi])2+Σjlcjcl[γj,γj][γl,γl]Σi=1kci2=(Σi=1kci[γi,γi])2Σi=1kci2(Σi=1kci2)(Σi=1k[γi,γi])Σi=1kci2Σi=1k[γi,γi]kLk2,

where the inequality in the second line is due to Cauchy-Schwarz inequality. Now from (5.6), (5.7) and Lemma 1, we have

λk(1+αkLk2)λk[α]λk.

From these inequalities, it can be derived that

0λkλk[α]αkLk2λk.

Therefore, λk[α]λk as α → 0.

Again by (5.6), (5.7), and note that γk[α]=1, we have

λk(1+αkLk2)ΓPk1γk[α]2+λkγk[α]2+α[γk[α],γk[α]]=ΓPk1γk[α]2+λk1+α[γk[α],γk[α]].

Then

λk(1+α[γk[α],γk[α]])ΓPk1γk[α]2(1+αkLk2)+λk(1+αkLk2),

hence,

λkα[γk[α],γk[α]]ΓPk1γk[α]2(1+αkLk2)+λkαkLk2.

.

Now by (5.2), we have

[γk[α],γk[α]]αλk(λk1λk)2(k1)Lk2[γk[α],γk[α]](1+αkLk2)Γ+kLk2.

After rearranging the terms, we then obtain

[γk[α],γk[α]]{1αλk(λk1λk)2(k1)Lk2(1+αkLk2)Γ}kLk2.

When the expression in braces on the left of the above inequality is positive, which is equivalent to

α<1+4k(λk1λk)2(k1)λkΓ12kLk2,

(if k = 1, the right hand side is denned to be infinity), we have

[γk[α],γk[α]]kLk21αλk(λk1λk)2(k1)Lk2(1+αkLk2)Γ. (5.8)

When

α1+2k(λk1λk)2(k1)λkΓ12kLk2,

(if k = 1, the right hand side is denned to be infinity), it can be shown that

1αλk(λk1λk)2(k1)Lk2(1+αkLk2)Γ12,

and then it follows from (5.8) that

[γk[α],γk[α]]2kLk2.

Lemma 4

For any 1 ≤ k ≤ K and any

0αλkλk+12kLk2λk, (5.9)

we have

(IPk)γk[α]22λkλk+1[Γα2(λkλk1λk)2(k1)Lk2[γk[α],γk[α]]+λkα[γk[α],γk[α]]12Lk] (5.10)

Proof. By the following orthogonal decomposition

γk[α]=Pk1γk[α]+(γk[α],γk)γk+(IPk)γk[α], (5.11)

we have

(γk[α],Γγk[α])=(Pk1γk[α],ΓPk1γk[α])+(γk[α],γk)2(γk,Γγk)+((IPk)γk[α],Γ(IPk)γk[α])ΓPk1γk[α]2+λk(γk[α],γk)2+λk+1(IPk)γk[α]2, (5.12)

where the last inequality follows from the fact that (IPk)γk[α] belongs to the closed subspace spanned by {γj, jk + 1} in which the largest eigenvalue of Γ is λk+1. On the other hand, by (3.6), we have

(γk[α],Γγk[α])=γk[α](γk[α],γk[α])α=γk[α]γk[α]2+αγk[α][γk[α],γk[α]]=λk[α]Pk1γk[α]2+λk[α](γk[α],γk)2+λk[α](IPk)γk[α]2+αλk[α][γk[α],γk[α]]. (5.13)

From (5.12) and (5.13),

λk[α]Pk1γk[α]2+λk[α](γk[α],γk)2+λk[α](IPk)γk[α]2+αλk[α][γk[α],γk[α]]ΓPk1γk[α]2+λk(γk[α],γk)2+λk+1(IPk)γk[α]2,

then

(λk[α]λk+1)(IPk)γk[α]2(Γλk[α])Pk1γk[α]2+(λkλk[α])(γk[α],γk)2αλk[α][γk[α],γk[α]](Γλk[α])Pk1γk[α]2+(λkλk[α])(γk[α],γk)2. (5.14)

It follows from (5.9) that αkLk2λk12(λkλk+1). Then by (5.5), we have

λkλk[α]12(λkλk+1),

hence,

λk[α]λk+112(λkλk+1). (5.15)

Because

λk(γk[α],γk)=(γk[α],Γγk)=(Γγk[α],γk)=λk[α](γk[α],γk)α=λk[α]{(γk[α],γk)+α[γk[α],γk]},

we have

(λkλk[α])(γk[α],γk)=λk[α]α[γk[α],γk]. (5.16)

From (5.14), (5.15) and (5.16),

12(λkλk+1)(IPk)γk[α]2ΓPk1γk[α]2+λk[α]α[γk[α],γk](γk[α],γk)ΓPk1γk[α]2+λk[α]α[γk[α],γk]ΓPk1γk[α]2+λk[α]α[γk[α],γk[α]]12[γk,γk]12.

Now by Lemma 2,

12(λkλk+1)(IPk)γk[α]2Γα2(λkλk1λk)2(k1)Lk2[γk[α],γk[α]]+λkα[γk[α],γk[α]]12Lk.

Now we can prove Theorem 4.1. It follows from the definition (4.3) of α0 that all the conditions in Lemmas 3 and 4 are satisfied. From the orthogonal decomposition

γk[α]=Pk1γk[α]+(γk[α],γk)γk+(IPk)γk[α],

we have

1=γk[α]2=Pk1γk[α]2+(γk[α],γk)2+(IPk)γk[α]2.

Hence, it follows from Lemma 2, Lemma 4 and (5.4) in Lemma 3 that

(γk[α],γk)2=1Pk1γk[α]2(IPk)γk[α]21α2(λkλk1λk)2(k1)Lk2[γk[α],γk[α]]2λkλk+1[Γα2(λkλk1λk)2(k1)Lk2[γk[α],γk[α]]+λkα[γk[α],γk[α]]12Lk]122kLk2λkλkλk+1α2k(k1)Lk4(λkλk1λk)2{1+2Γλkλk+1}α2. (5.17)

Define

a=2k(k1)Lk4(λkλk1λk)2{1+2Γλkλk+1},b=22kLk2λkλkλk+1. (5.18)

By solving the following inequalities,

aα2+bα12,α0,

we obtain 0αb2+2ab2a. Since

b2+2ab2a=1b2+2a+b12b2+2a122max{b2,2a}122min{1b,12a}

By the definition (4.3) of α0 and (5.18), we have

α0122min{1b,12a}b2+2ab2a.

Hence, for any 0 ≤ αα0, we have aα2+bα12. Now it follows from (5.17) that, for any 0 ≤ αα0,

(γk[α],γk)21bαaα2=12+(12bαaα2)12. (5.19)

Because γk[α] is a continuous function of α, (γk[α],γk) is also a continuous function of α and (γk[0],γk)=(γk,γk)=1. Hence, it follows from (5.19) that (γk[α],γk)>0 for all 0 ≤ αα0.

From (5.16), (5.17) and (5.4), we have

(λkλk[α])=λk[α]α[γk[α],γk](γk[α],γk)λk[α]αγk[α],γk(γk[α],γk)2λk[α]α[γk[α],γk[α]]12[γk,γk]12(γk[α],γk)2=2kLk2λkα(1+O(kLk2λkλkλk+1α+k(k1)Lk4λk2Γ(λk1λk)2(λkλk+1)α2)).

By (5.17) and (γk[α],γk)>0, we have

γk[α]γk2=2(1(γk[α],γk))2(1(γk[α],γk))(1+(γk[α],γk))=2(1(γk[α],γk)2),

and thus

γk[α]γka42kLk2λkλkλk+1+α4k(k1)Lk4(λkλk1λk)2{1+2Γλkλk+1}.

Proof of Theorem 4.2

We first study the properties of the “half-smoothing” operators Sα. At the end of Section 2, we know that Sα is a bounded linear operator from L2([a,b]) to L2([a,b]) with norm less than or equal to 1. Moreover, Sα is a one to one (injective) map. Hence, its inverse Sα1 exists. When α = 0, S0 is just the identity operator I in L2([a,b]). The following lemma gives the reason why Sα is called “half-smoothing” operators.

Lemma 5

The range of Sα (or the domain of Sα1) is W22([a,b]). Moreover, for any fW22([a,b]),

Sα1f2=fα2. (5.20)

Proof If α = 0, the results are trivial. Hence, we assume that α > 0. Since the space C[a, b] of smooth functions is dense in space

(W22([a,b]),α),

for any fW22([a,b]), there exists a sequence {fmC[a, b], m ∈ ℕ} such that ∥fm − fα → 0. One can see that the domain of Sα2=I+αLL contains C[a, b], hence C[a, b] is also in the domain of Sα1. Now we compute

Sα1flSα1fm2=(Sα1flSα1fm,Sα1flSα1fm)=(flfm,Sα2(flfm))=(flfm,(I+αLL)(flfm))=(flfm,flfm)+(flfm,αLL(flfm))=(flfm,flfm)+α(L(flfm),L(flfm))=(flfm,flfm)+α[flfm,flfm]=flfmα0, (5.21)

as m, l → ∞. Hence, {Sα1fm,m} is a Cauchy sequence in L2([a, b]). It converges to some function, say g, in L2([a, b]). Since Sα is a bounded operator, fm=SαSα1fm converges to Sαg in L2-norm. However, fm converges to f in ∥ · ∥α norm, it also converges in L2-norm. Therefore, Sαg = f, that is, f is in the range of Sα. Hence, W22([a,b]) is in the range of Sα. Because for any m ∈ ℕ, from a similar calculation as in (5.21),

Sα1fm2=fmα2,

and

Sα1fmSα1f0,fmfα0,

we have Sα1f2=fα2.

Now we show that the range of Sα is equal to W22([a,b]). Since we have shown that W22([a,b]) is in the range of Sα and Sα is a one-to-one map, we only need to show that the range of W22([a,b]) under Sα1 is L2([a, b]). By (5.20) and the completeness of (W22([a,b]),α), the range of W22([a,b]) under Sα1 is a closed subspace of L2([a, b]). If the range of W22([a,b]) under Sα1 is not L2([a, b]), then we can find 0 ≠ hL2([a, b]) such that

(h,Sα1f)=0,fW22([a,b]).

Since one can see that the domain of Sα2=I+αLL is contained in W22([a,b]), we have

(h,Sα1f)=0,fdomain ofSα2.

Then

(h,Sα1f)=(Sα1Sαh,Sα1f)=(Sαh,Sα2f)=0,fdomain ofSα2.

However, because the range of Sα2 is the whole L2([a, b]), we have Sαh = 0. Hence h = 0 since Sα is a one-to-one map. We get a contradiction. Therefore, the range of Sα is equal to W22([a,b]).

Lemma 6

{(λ^j[α],Sα1γ^j[α]):j} and {(λj[α],Sα1γj[α]):j} are eigenvalues and eigenfunctions of the compact operators SαΓ̂nSα and SαΓ̂Sα in L2([a, b]) respectively. Moreover, there are no other eigenvalues for SαΓ̂nSα and SαΓ̂Sα.

Note that the L2 norms of Sα1γ^j[α] and Sα1γj[α] may not be 1.

Proof. If α = 0, the results are trivial. Hence, we assume that α > 0. Because {(λ^j[α],γ^j[α]):j} are solutions of the successive optimization problems (3.3) and (3.4), then by Lemma 5,

(Sα1γ^1[α],SαΓ^nSαSα1γ^1[α])Sα1γ^1[α]2=(γ^1[α],Γ^nγ^1[α])γ^1[α]α2=λ^1[α]=max0γW22([a,b])(γ,Γ^nγ)γα2=max0γW22([a,b])(Sα1γ,SαΓ^nSαSα1γ)Sα1γ2=max0βL2([a,b])(β,SαΓ^nSαβ)β2.

Hence, (λ^1[α],Sα1γ^1[α]) are the first eigenvalue and the corresponding eigenfunction of SαΓ̂nSα. Similarly, we can prove the conclusions for other eigenvalues and eigenfunctions.

Define

H=theBanachspaceofallcompactboundedoperatorsfromL2([a,b])toL2([a,b])withnormdefinedin(2.1). (5.22)

For the definition and properties of compact operators in Banach spaces, we refer reader to Chapter 21 in Lax [9]. Define a sequence of stochastic processes

{Zn(α)=n(SαΓ^nSαSαΓSα),n,0αα0},

which is indexed by α and takes values in H because both Γ̂n and Γ are compact operators and Sα is a bounded operator. Note that Zn(0)=n(Γ^nΓ). We follow the notations in Dauxois et al. [3]. Let F denote the space of Hilbert-Schmidt operators from L2([a, b]) to L2([a, b]). Then F is a Hilbert space with a inner product denoted by < ·, · >F. By Assumption 1,

E[X4]<.

Thus Γ̂n, Γ ∈ F. It follows from Proposition 5 in Dauxois et al. [3] that {Zn(0), n ∈ ℕ}, regarded as a sequence of random elements with values in F, converges in distribution to the Gaussian random element in F with mean 0 and covariance operator Q, where

Q=E[(XXΓ)~(XXΓ)]=E[(XX)~(XX)]Γ~Γ. (5.23)

XX denotes the bounded operator from L2([a, b]) to L2([a, b]) with (XX) (γ) = (γ, X)X for any γL2([a, b]). Γ⊗̃Γ denotes the bounded operator from F to F with (Γ⊗̃Γ)(Λ) = 〈Λ,Γ〉F Γ for any Λ ∈ F. The other terms in (5.23) are denned similarly. Note that according to the definition (5.23), Q is an operator from F to F. However, because F is a Hilbert space, there is an isometry between F and its dual space F′. Hence, Q can be regarded as a bounded operator from F′ to F and then it satisfies the definition of covariance operators in Remark (1) after Theorem 4.2. However, in this paper, we will consider the space H of compact operators which is larger than the space F of Hilbert-Schmidt operators (every Hilbert-Schmidt operator is compact). In the proof of Proposition 6 in Dauxois et al. [3], the authors used the fact that if A is a Hilbert-Schmidt operator, then (AzI)−1 is also a Hilbert-Schmidt operator, where z is a complex which is not an eigenvalue of A and I is the identity operator. However, this is not true in general. But (AzI)−1 is a bounded operator. Because the norm (2.1) in H is smaller than the norm in F, the embedding map i : FH (i maps any Hilbert-Schmidt operator to itself) is a bounded operator. Then we have

Lemma 7

{Zn(0), n ∈ ℕ}, regarded as a sequence of random, elements with values in H, converges in distribution to a Gaussian random element in H with mean zero and covariance operator iQi*, where i* is the adjoint operator of i and Q is defined in (5.23).

Proof. It follows immediately from the following lemma.

Lemma 8

Suppose that {Xn, n ≥ 1} is a sequence of random, elements with values in a Banach space B. If Xn converges in distribution to a Gaussian random element X with mean zero and covariance operator Λ. Let T be a bounded operator (that is, a continuous linear function) from B to another Banach space C. Then T(Xn) converges in distribution to T(X) which is also a Guassian random element with mean zero and covariance operator TΛT*, where T* is the adjoint operator of T.

Proof. Since T is a continuous map from B to C, by continuous mapping theorem, T(Xn) converges in distribution to T(X). Now we show that T(X) is an Guassian random element. For any bounded linear functional fC′, fοTB′. Hence, f(T(X)) = f ο T(X) is a Gaussian random variable since X is Gaussian. Thus T(X) is Guassian and obviously its mean is zero. In order to compute it covariance operator, we intruduce the following notations. For any xB, yC and fB′, gC′, define 〈x, fB = f(x), 〈y, gC = g(y). By the definition of covariance operators (see Remark (1) after Theorem 4.2) and the definition of adjoint operators, for any g, hC′,

E[g(T(X))h(T(X))]=E[(gT(X))(hT(X))]=E[(T(g)(X))(T(f)(X))]=Λ(T(g)),T(f)B=ΛT(g),T(f)B=TΛT(g),fC.

Therefore, the covariance operator of TX is TΛT*.

Lemma 9

For any finite 0 ≤ α1 < … < αk ≤ α0, the sequence

{(Zn(α1),,Zn(αk)),n}

converges in distribution to a Gaussian random element with values in Hk and mean zero, where Hk is the product space of k copies of H.

Proof. This lemma follows from Lemma 8 and the fact that

(Zn(α1),,Zn(αk))=(Sα1Zn(0)Sα1,,SαkZn(0)Sαk)

is a continuous and linear function of Zn(0) since Sα1, i = 1,… , k are bounded operators.

Unfortunately, Sα is not continuous as α → 0 under the norm (2.1). For example, let

[a,b]=[0,2π],fn(t)=12πeint.

By (5.20),

Sα1fn2=fnα2=fn2+α[fn,fn]=1+αn4.

Define gn=11+αn4Sα1fn. Then ∥gn∥ = 1 and

(SαI)gn=11+αn4fngngn11+αn4fn=111+αn4.

Therefore, ∥Sα − I∥ ≥ 1 for all α. Note that S0 = I. However, we have the following results.

Lemma 10

For any f ∈ L2([a, b]), α → Sαf is a continuous map from [0, α0] to L2([a, b]).

Proof. Let E be the resolution of the identity for the self-adjoint operator Sα0 (for reference, see Chapter 12 of Rudin [14]). Because Sα0 is a positive operator with ∥Sα0∥ ≤ 1, Ef, f is a bounded positive Borel measure in [0, 1]. Fix α ∈ [0, α0].

Sα=(I+αLL)12=((1αα0)I+αα0(I+α0LL))12=((1αα0)I+αα0Sα02)12=Sα0(αα0+(1αα0)Sα02)12.

Now define a family continuous functions on [0, 1],

φα(x)={xαα0+(1αα0)x2,0<αα01α=0,},

then Sα = φα(Sα0). Let α′ ∈ [0, α0] and α′α. It follows from Theorem 12.21 and 12.23 in Chapter 12 of Rudin [14] that

(SαSα)f2=01(φα(x)φα(x))2dEf,f(x).

The integrand on the right hand side is bounded. If α ≠ 0, the integrand converges to 0 at each point in [0, 1] as α′α. By the bounded convergence theorem, ∥(Sα′Sα)f∥2 → 0. If α = 0, the integrand converges to 0 at each point in [0, 1] except 0. If we can show that the measure value Ef, f({0}) of Ef, f on the set {0} is zero, then by the bounded convergence theorem, we still have ∥(Sα′Sα) f2 → 0. In fact, for any gL2([a, b]),

(g,Sα0E({0})f)={0}xdEg,f(x)=0.

Hence, Sα0 E({0})f = 0. Because Sα0 is a one-to-one operator, E({0})f = 0. Therefore,

Ef,f({0})=(f,E({0})f)=0.

Lemma 11

For any compact operator Λ in L2([a, b]), α → SαΛSα is a continuous map from [0, α0] to H.

Proof. By Lemma 11 in Section XI.9 of Dunford and Schwartz [5], there exists a sequence Λm of bounded operators having finite-dimensional range, such that ∥Λm − Λ∥ → 0. If we can show that for each m, αSαΛmSα is a continuous map, then since ∥SαΛmSαSαΛSα∥ ≥ ∥Λm − Λ∥ → 0 uniformly, αSαΛSα is continuous. Now fix m and 0 ≤ αα0. Let {e1, …, ek} be an orthonormal basis of the range of Λm and α′α. For any fL2 ([a, b]) with ∥f∥ ≤ 1,

SαΛmSαfSαΛmSαf=(SαSα)ΛmSαf+SαΛm(SαSα)f(SαSα)ΛmSαf+SαΛm(SαSα)f=(SαSα)Λm(SαSα)f+(SαSα)ΛmSαf+SαΛm(SαSα)f(SαSα)Λm(SαSα)f+(SαSα)ΛmSαf+SαΛm(SαSα)f3Λm(SαSα)f+(SαSα)ΛmSαf.

Because

ΛmSαf=Σi=1k(ΛmSαf,ei)ei
(SαSα)ΛmSαfΣi=1k|(ΛmSαf,ei)|(SαSα)eiΣi=1kΛm(SαSα)ei

which converges to 0 uniformly for all fL2([a, b]) with ∥f∥ ≤ 1 by Lemma 10. Now

Λm(SαSα)f2=Σi=1k|(Λm(SαSα)f,ei)|2=Σi=1k|(f,(SαSα)Λmei)|2Σi=1k(SαSα)Λmei2.

which converges to 0 uniformly for all fL2([a, b]) with ∥f∥ ≤ 1 by Lemma 10, where Λm* is the adjoint operator of Λm. Hence, ∥Sα′ΛmSα′SαΛmSα∥ → 0.

In the next lemma, we assume that all the eigenfunctions have norms 1.

Lemma 12

Suppose that α → Λ(α) is a continuous map from [0, α0] to the suhspace of positive compact operators in L2([a, b]) in H. Assume that the first K eigenvalues of Λ(α) for any α ∈ [0, α0] are positive and mutually different, and each of them has multiplicity 1. Then given the first K eigenfunctions {ek[0],1kK} of Λ(0), there exist unique choices of the first k eigenfunctions {ek[α],1kK} of Λ(α) for any α ∈ (0, α0] such that αek[α] is a continuous map from [0, α0] to L2([a, b]) for any 1 ≤ k ≤ K.

Note that for each 1 ≤ kK and 0 ≤ αα0, there exist two eigenfunctions with norm 1 of Λ(α) corresponding its k-th eigenvalues and any one of the two eigenfunctions is equal to the other one multiplied by −1.

Proof. Let μ1[α]>>μK[α]>0 be the first K eigenvalues of Λ(α). Let Ek(α) be the orthogonal projection onto the space spanned by the ek[α], 1 ≤ kK, 0 ≤ αα0. Note Ek(α) does not depend on the sign of ek[α].

We first show that for any 1 ≤ kK, Ek(α) is a continuous function of from [0, α0] to H. For any fixed α ∈ [0, α0], we can find a small positive number εα such that the K + 1 intervals

[μ1[α]α,μ1[α]+α],[μ2[α]α,μ2[α]+α],,[μK+1[α]α,μK+1[α]+α]

are disjoint. Since Λ(α) is a continuous function, we can choose a neighborhood ℳα of α in [0, α0], such that for any α′ ∈ ℳα

max1kK+1|μk[α]μk[α]|Λ(α)Λ(α)α4.

where the first inequality follows from Corollary 4 in Section XI.9 of Dunford and Schwartz [5]. Now we define K circles on the complex plane ℂ,

Ck=the circle with centerμk[α]and radiusα,1kK.

Then one can see that for any α′ ∈ ℳα, the disk bounded by the circle Ck only contains the k-th eigenvalues μk[α] of Λ(α′). Hence, we have (see Section VII.3 of Dunford and Schwartz [4] or Definition 10.26 in Rudin [14])

Ek(α)=12πiCk(zIΛ(α))1dz,

for any α′ ∈ ℳα. Since (zI − Λ(α′))−1 is a continuous function of zCk Ck is a compact set, we have

M=supzCk(zIΛ(α))1<. (5.24)

Since Λ(α) is a continuous function of α, for any 0 < δ < 1, we can find a neighborhood 𝒩α of α such that

Λ(α)Λ(α)δM,αNα. (5.25)

Now for any α′ ∈ ℳα ⋂ 𝒩α,

Ek(α)Ek(α)12πCk(zIΛ(α))1(zIΛ(α))1dz=12πCk(zIΛ(α)(Λ(α)Λ(α)))1(zIΛ(α))1dz=12πCk(zIΛ(α))1(I(Λ(α)Λ(α))(zIΛ(α))1)1(zIΛ(α))1dz12πCk(zIΛ(α))1(I(Λ(α)Λ(α))(zIΛ(α))1)1IdzM2πCk(I+Σk=1[(Λ(α)Λ(α))(zIΛ(α))1]k)IdzM2πCkΣk=1[Λ(α)Λ(α))(zIΛ(α))1]kdzM2πCkdzΣk=1[δMM]k (5.26)

by (5.24) and (5.25)

=M2πCkdzδ1δ

Since δ can be arbitrarily small, Ek(α) is continuous at α.

Now we show that for any given α ∈ [0, α0], and given ek[α], there exists a neighborhood [α1, α2] of α such that for any α′ ∈ [α1, α2], we can uniquely choose ek[α] such that ek[α] is continuous in this neighborhood. Because Ek(α′) is a continuous function of α′, Ek(α)ek[α] is a continuous function of α′ and its value is 1 at α′ = α. Hence, we can find a neighborhood [α1, α2] of α such that Ek(α)ek[α]12 for α′ ∈ [α1, α2]. Then

ek[α]=Ek(α)ek[α]Ek(α)ek[α],

are eigenfunctions and continuous in [α1, α2]. Now we show the uniqueness. Suppose e~k[α], α′ ∈ [α1, α2] is another choice of the eigenfunctions such that it is continuous and e~k[α]=ek[α]. If for some α[α1,α2], ek[α]e~k[α], we have ek[α]=e~k[α]. Since both the inner products (ek[α],ek[α]) and (ek[α],e~k[α]) are continuous functions for α′ ∈ [α1, α2]. By the choice of [α1, α2], |(ek[α],ek[α])|=|(ek[α],e~k[α])|12. Because (ek[α],ek[α])=(ek[α],e~k[α]), one of them must be negative. Without loss of generality, we assume that (ek[α],ek[α])<0. Since (ek[α],ek[α])=1>0, it follows from the intermediate value theorem that there is at least one point α‴ between α and α″ such that (ek[α],ek[α])=0. However, it is impossible because

|(ek[α],ek[α])|=Ek(α)ek[α]12.

Hence we have proved the uniqueness.

Fix ek[0]. Let the set

V={α[0,α0]:we can uniquely chooseek[α]forα[0,α0]such thatek[α]is continous in[0,α]}.

By the arguments in the last paragraph, 𝒱 is nonempty. Now we show that the set 𝒱 is an open set. Suppose that α* is any point in 𝒱. It follows from the last paragraph that there exists a neighborhood [α1, α2] of α* such that given e[α*], we can uniquely choose the sign of e[α] for any α ∈ [α1, α2] to make e[α], α ∈ [α1, α2] a continuous function. We show that [α1, α2] ⊂ 𝒱. Let α** be any point in [α1, α2]. It is easy to see that we can choose the signs of e[α] for all α ∈ [0, α**] such that e[α] is a continuous function of α in [0, α**]. We only need to show the uniqueness of e[α]. The uniqueness is obvious if α** ≥ α* since α* ∈ 𝒱. Hence we assume that α** < α*. We will proceed by contradiction. Assume that there are two different continuous functions e~[α] and e[α], 0 ≤ αα**. By the definition of [α1, α2], we can choose a continuous function e^[α], α** ≤ αα*. Define

e[α]={e~[α]if0ααe^[α]ifαααande~[α]=e^[α]e^[α]ifαααande~[α]=e^[α]},

and

e[α]={e[α]if0ααe^[α]ifαααande[α]=e^[α]e^[α]ifαααande[α]=e^[α]}

Then e~[α] and e[α] are two different continuous functions in [0, α*], which contradicts to α* ∈ 𝒱. Hence, 𝒱 is an open set.

Now if we can prove that 𝒱 is also a closed set, we have 𝒱= [0, α0]. Let αm ∈ 𝒱 be a sequence of positive numbers converging to α ∈ [0, α0]. If for some m, αmα it is obvious that α ∈ 𝒱. Hence we assume that αm < α for all m. Then we can uniquely choose the signs of of ek[α] such that ek[α] is continuous in [0, α). Let ek[α] be one of the two eigenfunctions with norm 1. Because for any α′ < α

|(ek[α],ek[α])21|=|(ek[α],Ek(α)ek[α])(ek[α],Ek(α)ek[α])|Ek(α)Ek(α)

goes to zero as α′α, (ek[α],ek[α])21. Since ek[α] continuous in [0, α). (ek[α],ek[α]) converges either to 1 or −1. In the latter case, we change ek[α] to ek[α]. Hence, without loss of generality, we assume that (ek[α],ek[α])1 as α′α. Now one can see that ek[α] is continuous on [0, α] and its uniqueness is obvious. Hence, α ∈ 𝒱. We have proven that 𝒱 is a close set.

Define CH [0, α0] to be the space of all the continuous function from [0, α0] → H (see Chapter 3 of Billingsley [1]). For any {Λ(α) : 0 ≤ αα0} ∈ CH[0, α0], define a norm

Λ=sup0αα0Λ(α). (5.27)

Under the norm (5.27), CH[0, α0] is a Banach space. Recall the definition

{Zn(α)=n(SαΓ^nSαSαΓSα),n,0αα0}.

By Lemma 11, we can regard the stochastic processes Zn in [0, α] as random elements with values in CH [0, α0]. Define a linear map Θ: HCH[0, α0] such that for any compact operator UH,

Θ(U)={SαUSα,0αα0}. (5.28)

Lemma 13

Θ is a bounded operator and the sequence {Zn, n ∈ ℕ} of stochastic processes with sample paths in CH[0, α0] converges in distribution to the Gaussian random, element with mean zero and covariance operator ΘiQi*Θ*.

Proof. Since the norm of Sα is less than or equal to 1, for any VH,

sup0αα0SαUSαSαVSαUV.

Hence, the map (5.28) is continuous and hence a bounded operator. Since Zn = Θ(Zn(0)), the lemma follows from Lemmas 7 and 8.

Now for any 1 ≤ kK, define

η^k[α]=Sα1γ^k[α]Sα1γ^k[α],ηk[α]=Sα1γk[α]Sα1γk[α]. (5.29)

Note that by Lemma 6, η^k[α] and ηk[α] are the eigenfunctions of SαΓ̂Sα and SαΓSα with norms 1. By (5.29) and because γ^k[α]=1 and γk[α]=1, we have

Sαη^k[α]=1Sα1γ^k[α],Saηk[α]=1Sα1γk[α], (5.30)

and

γ^k[α]=Sαη^k[α]Sαη^k[α],γk[α]=Sαηk[α]Sαηk[α]. (5.31)

Define ~k=λkλk+14, 1 ≤ kK, and εK = min1≤kK ε̃k. Then the K + 1 intervals

[λ1~1,λ1+~1],[λ2~2,λ2+~1],,[λK~K,λK+~K1],[λK+1~K,λK+1+~K], (5.32)

are disjoint. By the definition (4.3) of α0 and (5.5) in Lemma 3, for any 0 ≥ αα0 and 1 ≥ kK,

0λkλk[α]αkLk2λkα0kLk2λkλkλk+116kLk2λkkLk2λk~k4. (5.33)

Hence, λ1[α],,λK[α] are different mutually for all 0 ≥ αα0. Now given γk, 1 ≤ kK, by Lemma 11 and Lemma 12, we can uniquely choose the first K eigenfunctions {ηk[α],1kK} of SαΓSα such that ηk[0]=γk and ηk[α], 1 ≤ kK, are continuous functions of α. We have proved the claims about the continuity of γk[α], 1 ≤ kK at the beginning of the proof of Theorem 4.1.

Now we define K circles in the complex plane ℂ,

C1=the circle with centerλ1and radius~1,1kK,Ck=the circle with centerλk+~k1~k2and radius~k1+~k2,1kK. (5.34)

Note that the K discs bounded by Ck, 1 ≤ kK are disjoint and the intersections between these discs and the real line in the complex plane are just the first K intervals in (5.32). Let Ek(α) be the orthogonal projection onto the space spanned by the ηk[α], 1 ≤ kK, 0 ≤ αα0. Now because it follows from (5.33) that for any 0 ≤ αa0, 1 ≤ kK, the disk bounded by the circle Ck only contains the k-th eigenvalues λk[α] of SαΓSα for any 0 ≤ α ≤ a0, 1 ≤ kK, we have

Ek(α)=12πiCk(zISαΓSα)1dz. (5.35)

By Lemma 11, SaΓSα is a continuous function of α. Hence, by a similar calculation as in (5.26), it can be shown that Ek(α) is a continuous function of α.

Recall that we define in (4.6)

Ω0={ω:there exists at least oneα[0,α0]such thatλ^1[α],,λ^K[α]are not mutually different}.

Lemma 14

Ω0 is a measurable set and P0) → 0 as n → ∞.

Proof. Consider the subset

ε={B:Bis a positive compact operator, its firstKeigenvalues are mutually different and each of them has multiplicity1}.

ε is an open subset of the space of all positive compact operators which is closed in H, hence it is measurable. Let (Ω, ℱ) be the probability space and ([0, α0], ℬ[0, α0]) be the Lebesgue space. Since SαΓ̂Sα has continuous sample paths, it is jointly measurable in (Ω × [0, α0], ℱ × ℬ[0, α0]). One can see that Ω0c is the projection of the set {(ω, α) : SαΓ̂nSαε} to Ω. Therefore, Ω0c is measurable, so is Ω0. By (5.33) and the definition of εK (just above (5.32)), we have

Ω0{sup0αα0max1kK+1|λ^k[α]λk[α]|>K4}.

By Corollary 4 in Section XI.9 of Dunford and Schwartz [5],

sup0αα0max1kK+1|λ^k[α]λk[α]|sup0αα0SαΓ^nSαSαΓSαΓ^nΓ. (5.36)

Hence,

P(Ω0)P(Γ^nΓ>K4)0 (5.37)

by the law of large numbers.

For any ω ∈ Ω0, define E^nk(α) to be zero. For any ω ∉ Ω0, define E^nk(α) to be the orthogonal projection onto the space spanned by the k-th eigenfunction η^n[α] of SαΓ̂nSα (note that E^nk(α) does not depend on the sign of η^k[α]). By the same argument as in the proof of Lemma 12, we can show that E^nk(α) is a continuous function of SαΓ̂nSα, so it is measurable and continuous in α. Now let {em, m ∈ ℕ} be a set of complete orthonormal basis functions in L2([a, b]), we choose

η^k[0]=E^nk(0)γkE^nk(0)γkχ{E^nk(0)γk0}+Σm=1E^nk(0)emE^nk(0)emχ{E^nk(0)γk=0,E^nk(0)ej=0,1jm1,E^nk(0)em0} (5.38)

in Ω0c and 0 in Ω0, where χ is the indicator function. Then η^k[0] is measurable and

(η^k[0],ηk[0])0. (5.39)

Now by Lemma 11, Lemma 12 and the definition of Ω0, for any ω ∉ Ω0, we can uniquely choose η^k[α], 1 ≤ kK, such that η^k[α], 1 ≤ kK are continuous functions of α. η^k[α] is measurable by the following lemma. By (5.31), γ^k[α], 1 ≤ kK are continuous and measurable with (γ^k[0],γk)0.

Lemma 15

If for any 1 ≤ kK, η^k[α] is a measurable map to CL2([a,b][0, α0].

Proof. In Ω0c, E^nk(α)η^k[0] is a continuous function of α. Since E^nk(0)η^k[0]=1, let T^(1)=inf{α,E^nk(α)η^k[0]12}Λα0 in Ω0c. In Ω0, define (1) = 0. Then (1) is a nonnegative random variable. By Lemma 12, we have in Ω0c, if α(1),

η^k[α]=E^nk(α)η^k[0]E^nk(α)η^k[0].

Define a random element

ζ1=E^nk(T^(1))η^k[0]E^nk(T^(1))η^k[0]

in Ω0c and 0 in Ω0. Define a random variable T^(2)=inf{αT^(1),E^nk(α)ζ112}Λα0 and a random element

ζ2=E^nk(T^(2))ζ1E^nk(T^(1))ζ1

in Ω0c and 0 in Ω0. Similarly, we can define ((3), ζ3), …. One can show that for any ω ∈ Ωc, there are only finite (m)(ω) < α0, m =0, 1, 2, …, where (0)(ω). Hence in Ωc, we have

η^k[α]=Σm=0E^nk(α)ζmE^nk(α)ζmχ[T^(m),T^(m+1)),

where ζ0=η^k[0] and χ is the indicator function. Hence, η^k[α] is measurable.

By (5.33) and (5.36), in the event {Γ^nΓK4}Ω0c, for any 0 ≤ αα0, 1 ≤ kK, the disk bounded by the circle Ck only contains the k-th eigenvalues for SαΓ̂Sα and SαΓSα. Hence, in the event {Γ^nΓK4}, for any 0 ≤ αα0, 1 ≤ kK, we have

Ek(α)=12πiCk(zISαΓSα)1dz,E^nk(α)=12πiCk(zISαΓ^nSα)1dz. (5.40)

The proofs of the following Lemma 16 and Lemma 17 follows the ideas of Section 2 in Dauxois et al. [3]. Define linear maps ϕk : CH[0, α0] → CH[0, α0], 1 ≤ kK such that for any Λ ∈ CH[0, α0 and 0 ≤ αα0,

(ϕk(Λ))(α)=12πiCk[(zISαΓSα)1Λ(α)(zISαΓSα)1]dz, (5.41)

where (ϕk(Λ))(α) denotes the value of ϕk(Λ) at the point α. Then define ΦK = (ϕ1, ϕ2, …, ϕK) which is a linear map from CH[0, α0] to k=1KCH[0,α0]. One can verify that ϕk's are continuous. Hence ΦK is a bounded operator.

Lemma 16

The sequence {n(E^nkEk),1kK}n of stochastic processes has sample paths in k=1KCH[0,α0] a.s. and converges in distribution to a Gaussian random, element with mean zero and covariance operator ΦKΘiQiΘΦK.

Proof. In the event {Γ^nΓK4}, for each zCK,

(zISαΓ^nSα)1=((zISαΓSα)(SαΓ^SαSαΓnSα))1=(zISαΓSα)1=(I(SαΓ^SαSαΓnSα)(zISαΓSα)1)1. (5.42)

If

sup0αα0SαΓ^SαSαΓnSα=Γ^nΓ12~,

where

~1=max1kKsup0αα0supzCk(zISαΓSα)1<,

then by (5.42), we have an absolutely convergent series expansion

(zISαΓ^nSα)1=(zISαΓSα)1Σm=0((SαΓnSαSαΓ^nSα)(zISαΓSα)1)m.

Hence,

(zISαΓ^nSα)1(zISαΓSα)1=(zISαΓSα)1(SαΓSαSαΓ^nSα)(zISαΓSα)1+U^αn(z) (5.43)

where

U^αn(z)=(zISαΓSα)1Σm=2((SαΓSαSαΓ^nSα)(zISαΓSα)1)m.

Hence, in the event {Γ^nΓ12~},

U^αn(z)2~3Γ^nΓ2. (5.44)

Now in the event {Γ^nΓmin(12~,K4)}, by (5.42) and (5.43),

n(E^nk(α)Enk(α))=n2πiCk[(zISαΓ^Sα)1dz(zISαΓnSα)1]dz=ϕk(Zn)+12πiCknU^αn(z)dz (5.45)

Now we have from (5.44) and (5.45), for any δ > 0,

P(n(E^nkEnk)ϕk(Zn)>δ)P(Γ^nΓ>min(12~,K4))+P(n(E^nkEnk)ϕk(Zn)>δ,Γ^nΓmin(12~,K4))P(Γ^nΓ>min(12~,K4))+P(n2πiCkU^n(z)dz>δ)P(Γ^nΓ>min(12~,K4))+P(nΓ^nΓ2>πδ~3)0, (5.46)

as n → 0. By Lemmas 8 and 13, ΦK(Zn) = (ϕ1(Zn), ϕ2(Zn), … ϕK(Zn)) converges in distribution to the Gaussian element with mean zero and covariance operator ΦKΘiQiΘΦK. Now by (5.46), {n(E^nkEnk),1kK}n converges in distribution to the same distribution.

Define linear maps Ψk : CH[0, α0] → CL2([a,b) [0, α0], 1 ≤ kK such that for any Λ ∈ CH[0, α0],

ψk(Λ)={(IEk(α))Λ(α)ηk[α],0αα0}. (5.47)

Then we define a linear map ΨK:k=1KCH[0,α0]k=1KCL2([a,b])[0,α0] such that for any (Λ1,,ΛK)k=1KCH[0,α0],

ΨK(Λ1,,ΛK)=(ψ1(Λ1),,ψK(ΛK)). (5.48)

It is easy to see that ψK is a bounded operator.

Lemma 17

The sequence {n(η^k[α]ηk[α]),1kK}n of stochastic processes has sample paths in k=1KCL2([a,b])[0,α0] a.s. and converges in distribution to a Gaussian random element with mean zero and covariance operator ΨKΦKΘiQiΘΦKΨK.

Proof. By the definitions (5.29) of η^k[α],ηk[α],1kK,(ηk[α],ηk[α])2=ηk[α]2=1. In Ω0c, we have

sup0αα0n|(η^k[α],ηk[α])21|=sup0αα0n|(η^k[α],ηk[α])2(ηk[α],ηk[α])2|=sup0αα0n((η^k[α],ηk[α])η^k[α],ηk[α])((ηk[α],ηk[α])ηk[α],ηk[α])=sup0αα0n|(ηk[α],(E^nk(α)Ek(α))ηk[α])|

By (5.46), n(E^nk(α)Ek(α)) and ϕk(Zn) have the same limit distribution. Because for any Λ ∈ CH[0, α0],

(ηk[α],ϕk(Λ)ηk[α])=(ηk[α],Ck[(zISαΓSα)1Λ(α)(zISαΓSα)1]dzηk[α])=Ck(ηk[α],(zISαΓSα)1Λ(α)(zISαΓSα)1ηk[α])dz=Ck(zλk[α])2dz(ηk[α],Λ(α)ηk[α])=0 (5.49)

where we use the facts that

(zISαΓSα)1ηk[α]=(zλk[α])1ηk[α],Ck(zλk[α])2dz=0,

So we have

sup0αα0n|(η^k[α],ηk[α])21|0

in probability. By (5.39) and the continuities of η^k[a] and ηk[a], we have

sup0αα0n|(η^k[α],ηk[α])1|0, (5.50)

in probability. Now

n(η^k[α]ηk[α])=nEk(α)(η^k[α]ηk[α])+n(IEk(α))(η^k[α]ηk[α])=n((η^k[α],ηk[α])1)ηk[α]+n(IEk(α))η^k[α]=n((η^k[α],ηk[α])1)ηk[α]+n(IEk(α))E^nk(α)ηk[α](η^k[α],ηk[α])=n((η^k[α],ηk[α])1)ηk[α]+1(η^k[α],ηk[α])(IEk(α))n(E^nk(α)Ek(α))ηk[α]=n((η^k[α],ηk[α])1)ηk[α]+1(η^k[α],ηk[α])ψk(n(E^nk(α)Ek(α))). (5.51)

By (5.50), the first term in the last line converges to 0 in probability and (η^k[a],ηk[a])1 in probability. Hence, (n(η^1[α]η1[α]),,n(η^K[α]ηK[α])) has the same limit distribution as ΨK(n(E^nk(α)Ek(α))) which converges to a Gaussian random element with mean zero and covariance operator ΨKΦKΘiQiΘΦKΨK by Lemmas 8 and 16.

Define linear maps ϕk : CH[0, α0CH[0, α0], 1 ≤ kK, such that for any Λ ∈ CH[0, α0],

Uk(Λ)={(ηk[α],Ek(α)(ψk(Λ))(α))+(ηk[α],Λ(α)ηk[α])+((ψk(Λ))(α),Ek(α)ηk[α]),0αα0},

where Ψk is defined in (5.47) and (Ψk(Λ))(α) denotes the value of Ψk(Λ) at α. Define a liner map Uk:CH[0,α0]C[0,α0],1kK such that for any (Λ, …, ΛK),

K(Λ1,,ΛK)=(U1(Λ1),,UK(ΛK)). (5.52)

It is easy to see that ℧K is a bounded operator.

Lemma 18

The sequence {n(λ^k[α]λk[α]),1kK}n of stochastic processes has sample paths in k=1KC[0,α0] and converges in distribution to a Gaussian random, element with zero and covariance operator KΦKΘiQiΘΦKK.

Proof. The continuities of λ^k[α] and λk[α] follow from Lemma 11 and the inequalities

|λ^k[α]λ^k[α]|SαΓ^SαSαΓ^Sα,|λk[α]λk[α]|SαΓSαSαΓSα,

for any 0 ≤ α, α′ ≤ α0. In Ω0c,

n(λ^k[α]λk[α])=n((η^k[α],E^nk(α)η^k[α])(ηk[α],Ek(α)ηk[α]))=n((η^k[α],E^nk(α)η^k[α])(η^k[α],E^nk(α)ηk[α]))+n((η^k[α],E^nk(α)ηk[α])(η^k[α],Enk(α)ηk[α]))+n((η^k[α],Enk(α)ηk[α])(ηk[α],Enk(α)ηk[α]))=(η^k[α],E^nk(α)n(η^k[α]ηk[α]))+(η^k[α],n(E^nk(α)Enk(α))ηk[α])+(n(η^k[α]ηk[α]),Enk(α)ηk[α]) (5.53)

By Lemmas 16 and 17, η^k[α]ηk[α] and E^k[α]Ek[α] in probability. Hence by (5.53), n(λ^k[α]λk[α]) has the same limit distribution as

(ηk[α],Enk(α)n(η^k[α]ηk[α]))+(ηk[α],n(E^nk(α)Enk(α))ηk[α])+(n(η^k[α]ηk[α]),Enk(α)ηk[α])

which, by (5.51), has the same distribution as

(ηk[α],Enk(α)ψk(n(E^nk(α)Enk(α))))+(ηk[α],n(E^nk(α)Enk(α))ηk[α])+(ψk(n(E^nk(α)Enk(α))),Enk(α)ηk[α])=Uk(n(E^nk(α)Enk(α))).

Hence, {n(λ^k[α]λk[α]),1kK}n has the same limit distribution as K(n(E^n1(α)En1(α)),,n(E^nK(α)EnK(α))) which converges to a Gaussian random element with mean zero and covariance operator KΦKΘiQiΘΦKK by Lemmas 8 and 16.

Define a linear map K:k=1KCL2([a,b])[0,α0]k=1KCL2([a,b])[0,α0] such that for any (Λ1,,ΛK)k=1KCL2([a,b])[0,α0],

K(Λ1,,ΛK)={(1Sαη1[α]SαΛ1(α),,1SαηK[α]SαΛK(α)),0αα0}. (5.54)

K is a bounded operator.

Lemma 19

The sequence {n(γ^k[α]γk[α]),1kK}n of stochastic processes has sample paths in k=1KCL2([a,b])[0,α0] a.s. and converges in distribution to a Gaussian random element with mean zero and covariance operator KΨKΦKΘiQiΘΦKΨKK.

Proof. By (5.31),

γ^k[α]=Sαη^k[α]Sαη^k[α],γk[α]=Sαηk[α]Sαηk[α].

Therefore,

n(γ^k[α]γk[α])=n(Sαη^k[α]Sαη^k[α]Sαηk[α]Sαηk[α])=n(Sαη^k[α]Sαη^k[α]Sαη^k[α]Sαηk[α])+n(Sαη^k[α]Sαηk[α]Sαηk[α]Sαηk[α])=n(1Sαη^k[α]1Sαηk[α])Sαη^k[α]+1Sαηk[α]Sα(n(η^k[α]ηk[α])) (5.55)

Because

Sαη^k[α]Sαηk[α]η^k[α]ηk[α]0

in probability, by the definition (5.54) of ℑK, (5.55) and Lemma 17, {n(γ^k[α]γk[α]),1kK}n has the same limit distribution as K({n(γ^k[α]γk[α]),1kK}n) which converges to a Gaussian random element with mean zero and covariance operator KΨKΦKΘiQiΘΦKΨKK.

Proof of Corollary 4.1

By Lemma 18 and Lemma 19, the stochastic processes {n(λ^k[α]λk[α]),1kK}n and {n(γ^k[α]γk[α]),1kK}n convergence in distribution, hence they are tight by Theorem 5.2 in Billinsley [1] since CH[0, α0] and CL2[a,b][0, α0] are both complete and separable. Therefore, for any > 0, one can find a positive number M depending on such that

supnP(max1kKsup0αα0|n(λ^k[α]λk[α])|M),supnP(max1kKsup0αα0|n(γ^k[α]γk[α])|M).

In other words,

λ^k[α]λk[α]=Op(1n),γ^k[α]γk[α]=Op(1n)

uniformly in α, which combines Theorem 4.1 to get our corollary.

Proof of Corollary 4.2

First, we have decompositions

n(λ^k[αn]λk)=n(λ^k[αn]λk[αn])+n(λk[αn]λk),n(γ^k[αn]γk)=n(γ^k[αn]γk[αn])+n(γk[αn]γk).

Under the conditions on αn for eigenvalues and eigenfunctions respectively, by Theorem 4.1, we have n(λk[αn]λk)0 and n(γk[αn]γk)0 respectively. Since {n(γ^k[α]γk[α]),1kK,0αα0}n and {n(λ^k[α]λk[α]),1kK,0αα0}n converge in distribution by Theorem 4.2, they are tight. Hence, the asymptotic normalities of n(λ^k[αn]λk[αn]) and n(γ^k[αn]γk[αn]) follows from Theorem 4.2 and the following lemma. Then the corollary follows at once.

Lemma 20

Suppose that F is a metric space with distance d. Let CF[0, α0] denote the continuous function on [0, α0] taking values in F. Suppose we have a sequence {Yn(α), 0 ≤ αα0, n ∈ ℕ} of stochastic processes has sample paths in CF[0, α0]. Assume that Yn is tight and Yn(0) converges in distribution to a random element Y in F, then for any sequence αn of positive numbers converging to 0, Ynn) also converges in distribution to Y.

Proof. First, we show that for any > 0 we can find δ > 0 such that

supnP(sup0α,αδd(Yn(α),Yn(α))>).

Since Yn is tight, we can find a compact subset Χ of CF[0, α0] such that

supnP(YnΞ).

We can find a finite number of Λ1, … Λm ∈ Χ such that for any Λ ∈ Χ, we can find i such that sup0αα0d(Λi(α),Λ(α))3. Furthermore, we can find δ > 0 such that,

max1imsup0α,αδd(Λi(α),Λi(α))3.

Now it is easy to see that for any Λ ∈ Χ,

sup0α,αδd(Λ(α),Λ(α)).

Hence,

supnP(sup0α,αδd(Yn(α),Yn(α))>)supnP(YnΞ).

If αnδ, we have

P(d(Yn(0),Yn(αn))>).

Since is arbitrary, d(Yn(0), Yn(αn)) → 0 in probability.

Acknowledgments

Supported in part by NIH grants R01 GM59507, a pilot project from the Yale Pepper Center, and NSF grant DMS 0714817.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Billingsley P. Convergence of Probability Measures. 2nd Edition Wiley-Interscience; 1999. [Google Scholar]
  • 2.Cardot H, Ferraty F, Sarda P. Functional linear model. Statistics and Probability Letters. 1999;45:11–22. [Google Scholar]
  • 3.Dauxois J, Pousse A, Romain Y. Asymptotic theory for the principal component analysis ol a random vector lunction: some application to statistical inference. J. Multivariate Anal. 1982;12:136–154. [Google Scholar]
  • 4.Dunford N, Schwartz JT. Linear Operators, General Theory, Part 1. Wiley-Interscience; 1988. [Google Scholar]
  • 5.Dunford N, Schwartz JT. Linear Operators, Spectral Theory, Self Adjoint Operators in Hilbert Space, Part 2. Wiley-Interscience; 1988. [Google Scholar]
  • 6.Ferraty F, Vieu P. Nonparametric Functional Data Analysis: Theory and Practice. Springer; 2006. [Google Scholar]
  • 7.Glasserman P. Monte Carlo Methods in Financial Engineering. 1 edition Springer; 2003. [Google Scholar]
  • 8.Huang JZ, Shen H, Buja A. Functional principal components analysis via penalized rank one approximation. Electron. J. Statist. 2008;2:678–695. [Google Scholar]
  • 9.Lax PD. Functional Analysis (Pure and Applied Mathematics: A Wiley-Interscience Series of Texts, Monographs and Tracts) Wiley-Interscience; 2002. [Google Scholar]
  • 10.Ledoux M, Talagrand M. Probability in Banach Spaces: Isoperimetry and Processes. A Series of Modern Surveys in Mathematics, Springer; 2006. Ergebnisse der Mathematik und ihrer Gren-zgebiete. 3.Folge, Band 23. [Google Scholar]
  • 11.Qi X, Zhao H. Functional principal component analysis for discretely observed functional data. 2010 Submitted. [Google Scholar]
  • 12.Ramsay JO, Silverman BW. Functional Data Analysis. 2nd Edition Springer; New York: 2005. [Google Scholar]
  • 13.Riesz F, Sz.-Nagy B. Functional Analysis. Dover Publications; 1990. [Google Scholar]
  • 14.Rudin W. Functional Analysis. McGraw-Hill Science/Engineering/Math; 1991. 2nd Edition. [Google Scholar]
  • 15.Silverman BW. Smoothed functional principal components analysis by choice of norm. The Annals of Statistics. 1996;24:1–24. [Google Scholar]
  • 16.Weinberger HF. Variational Methods for Eigenvalue Approximation, 2nd Edition (CBMS-NSF Regional Conference Series in Applied Mathematics) Society for Industrial Mathematics; 1987. [Google Scholar]

RESOURCES