Abstract
Principal component analysis (PCA) is one of the key techniques in functional data analysis. One important feature of functional PCA is that there is a need for smoothing or regularizing of the estimated principal component curves. Silverman's method for smoothed functional principal component analysis is an important approach in situation where the sample curves are fully observed due to its theoretical and practical advantages. However, lack of knowledge about the theoretical properties of this method makes it difficult to generalize it to the situation where the sample curves are only observed at discrete time points. In this paper, we first establish the existence of the solutions of the successive optimization problems in this method. We then provide upper bounds for the bias parts of the estimation errors for both eigenvalues and eigenfunctions. We also prove functional central limit theorems for the variation parts of the estimation errors. As a corollary, we give the convergence rates of the estimations for eigenvalues and eigenfunctions, where these rates depend on both the sample size and the smoothing parameters. Under some conditions on the convergence rates of the smoothing parameters, we can prove the asymptotic normalities of the estimations.
Keywords: Functional PCA, smoothing methods, roughness penalty, convergence rates, functional central limit theorem, asymptotic normality
1. Introduction
Principal component analysis (PCA) is one of the key techniques in multivariate analysis and functional data analysis. An important difference between classical PCA and functional PCA is that there is a need for smoothing or regularizing of the estimated principal component curves in functional PCA (see Chapter 9 in Ramsay and Silverman [12]). Many methods have been proposed to estimate the smoothed functional principal components when the sample curves are fully observed. A general overview of these methods and an extensive list of references can be found in Ramsay and Silverman [12]. The reader can find in Ferraty and Vieu [6] more discussions on theoretical aspects and nonparametric methods for functional data analysis. Functional PCA has many important applications. For example, functional principal component regression (see for instance Cardot, Ferraty and Sarda [2]) is a direct application of functional principal coponents analysis.
The approach proposed in Silverman [15] is an important method for smoothing functional PCA (see Chapter 9 in Ramsay and Silverman [12]) due to its theoretical and practical advantages. First, the weak assumptions underlying this method make it applicable to data from many fields. Silverman [15] did not make any assumptions on the mean curves and sample curves. Hence, in addition to data with smooth random curves, this method can be applied to analyze data where the sample curves can be unsmooth or even discontinuous, such as those encountered in financial engineering, survival analysis and other fields. For covariance functions, Silverman [15] only assumed that they have series expansions by their eigenfunctions without imposing smoothing constraint. This is attractive because the covariance functions are continuous but unsmooth in many important models such as stochastic differential equation models in financial engineering and counting process models in survival analysis. Second, Silverman's method controls the smoothness of eigenfunction curves by directly imposing roughness penalties on these functions instead of on sample curves or covariance functions. Furthermore, this approach changes the eigenvalue and eigenfunction problems in the usual L2 space to problems in another Hilbert space, the Sobolev space (with a norm different from the usual norm in the Sobolev space). Therefore, many powerful tools from the theory of Hilbert space can be employed to study the properties of this method. Third, this approach incorporates the smoothing step into the step for computing eigenvalues and eigenfunctions. Therefore, this method is computationally efficient with the same computational load as the usual unsmoothed functional PCA. Fourth, the estimates produced by this method are invariant under scale transformations. As pointed out by Huang, Shen and Buja [8], the invariance property under scale transformations should be a guiding principle in introducing roughness penalties to functional PCA.
Despite all these advantages, lack of knowledge about the theoretical properties of this method makes it difficult to generalize it to the situations where the sample curves are only observed at discrete time points. Silverman [15] only proved consistency of the estimations as the sample size goes to infinity and the smoothing parameter goes to zero. Even the existence of the solutions to the successive optimization problems in this method is not established. It is not clear how the estimation errors depend on the sample size and the smoothing parameter. Asymptotic normalities of the estimations also need to be proved. In this paper, we aim to solve these open problems. In Section 2, we give the detailed backgroud, basic notations and our main assumptions. In Section 3, Silverman's method is introduced and the existence theorem for the successive optimization problems is proven. Our main results appear in Section 4. Section 5 contains detailed proofs of our theorems.
2. Notations and main assumptions
We introduce notations and definitions used throughout the paper. Let ℕ denote the collection of all the positive integers. We consider a finite time interval [a, b]. In this paper, we will mainly consider functions in the following two space, the L2 space
and the Sobolev space
where f′ and f″ donate the first and second derivatives of f, respectively. For any f, g ∈ L2([a, b]), define the usual inner product
with corresponding squared norm ∥f∥2 = (f, f). Given a smoothing parameter α > 0, for any f, , define
and the inner product
with corresponding squard norm . Note that is α = 0, we return to the L2([a, b]) space. For any bounded operator B from L2([a, b]) to L2([a, b]), define the norm
(2.1) |
For any measurable function A(s, t) on [a, b] × [a, b], if
then defines a bounded operator from L2([a, b) to L2([a, b]). To simplify the notation, we just use A to denote this operator, that is
and we have
Let X(t), a ≤ t ≤ b be a measurable stochastic process on [a, b]. Under Assumption 1 below, X(t) ∈ L2([a, b]) a.s.. Let {X1(t), X2(t), ⋯, Xn(t)} be i.i.d. sample curves from the distribution of X(t). Assume that EX(t) = ν(t). Define Γ to be the covariance function
and Γ̂n to be the sample covariance function
where X̄ is the sample mean curve
We will give our basic assumptions below. Silverman [15] made three assumptions in Section 5.2 in order to prove the consistency result. Our assumptions are stronger than those in Silverman [15].
Assumption 1
(2.2) |
Remark
This assumption is stronger than the first assumption in Section 5.2 of Silverman [15]. Under condition (2.2), the central limit theorem for sample covariance function holds (see Section 2 in Dauxois, Pousse and Romain [3] and Chapter 10 in Ledoux and Talagrand [10]).
- Assumption 1 is satisfied by many stochastic processes used in applications. For example, if X(t) is a bounded process, it is obvious that (2.2) is true. Gaussian processes are an important class of stochastic processes which are widely used in statistics and other areas. Suppose that X(t) is a Gaussian process with mean zero. Then
Hence if Γ(t, t) is integrahle in [a, b], which is satisfied by Gaussian processes commonly encountered in applications, (2.2) is true. Now let us consider the standard Brovmian motion, the most widely studied Gaussian process. For the standard Brovmian motion, Γ(t, t) = t, hence Assumption 1 is satisfied. It is well known that its sample paths are continuous and nowhere differentiable almost surely. For non-Gaussian processes, let us consider a Poisson process with rate 1 in [0,1], Its sample paths are step functions only taking integer values and hence discontinuous. It is easy to verify that Assumption 1 is satisfied by Poisson processes. - Under condition (2.2), we have
Therefore, the operator Γ is a Hilbert-Schmidt operator, hence it is a compact operator (see Section XI.6 in Dunford and Schwartz [5] or Section 97 in Riesz and Sz.-Nagy [13]). It follows that the set of eigenvalues of this operator are bounded and at most countable with at most one limit point at 0. Because the covariance operator Γ is always nonnegative-definite, all the eigenvalues are nonnegative. Let λ1 ≥ λ2 ≥ ⋯ ≥ 0 be the collection of all eigenvalues and the corresponding eigenfunctions are γ1, γ2, ⋯. Every eigenfunction has been scaled to have L2-norm, 1. The set of all the eigenfunctions forms an orthonormal basis of L2([a, b]). Furthermore, we have decomposition
the series on the right hand side converges in the L2 sense. If Γ is a continuous function, the series on the right hand side absolutely and uniformly converges. Although Silverman [15] did not assume that Γ is square integrahle, he assumed the decomposition form of (2.3).(2.3) - We have
- By (2.2), X(s) is square integrable a.s.. Hence, the sample covariance functions Γ̂n satisfies
a.s.. Then we have that the eigenvalues λ̂1 ≥ λ̂2 ≥ ⋯ ≥ 0 since the operator Γ̂ is nonnegative-definite. The corresponding eigenfunctions γ̂j, j ∈ ℕ satisfying
Suppose that we are interested in estimating the first K eigenvalues and eigenfunctions of Γ.
Assumption 2
Any eigenvalue λj, 1 ≥ j ≤ K has multiplicity 1, so that
Remark
This assumption is just the third assumption in Section 5.2 of Silverman [15]. If an eigenvalue has multiplicity 1, then the corresponding eigen-function is uniquely determined up to a sign. If the multiplicity is larger than 1, the eigenfunctions can not he uniquely determined up to a sign.
Assumption 3
The eigenfunctions λj, 1 ≤ j ≤ K belong to
Remark
This assumption is the second assumption in Section 5.2 of Silverman [15] and is essential in our paper.
- If the covariance function Γ satisfies some smoothness conditions, then Assumption 3 is true. For example, suppose that Γ(s, t), and are all continuous on [a, b] × [a, b] (hence they are bounded and square integrable), one can easily verify that
Hence, by Cauchy-Schwarz inequality and ∥γk∥ = 1, we have - There are many important random processes whose covariance matrices are not smooth, but the eigenfunctions corresponding to nonzero eigenvalues belong to . The simplest examples are standard Brownian motion and Poisson process with rate 1 in time interval [0, 1]. Their covariance functions are the same and equal to min(s, t), 0 ≤ s, t ≤ 1 (see Page 89 in the book Glasserman [7]). The eigenvalues and eigenfunctions are
The next example is the famous Black-Scholes Model in finance. Let St denote the price of a stock at time t. Then St satisfies the following SDE,(2.4)
where μ, is the instantaneous mean return, σ is the instantaneous return volatility and Wt is a Brownian motion. The covariance function of St is smooth except at the points on the diagonal line {(s, t) : s = t}. The same is true for the following example. Consider the counting processes model in survival analysis. Let Nt be the number of the occurrences of the event in [0, t]. Then Nt satisfies
where λ(t) is a smooth intensity function and Mt is a martingale.
Silverman [15] introduced a “half-smoothing” operator which plays an important role in this paper. We give a strict definition of this operator here. We first define an unbounded operator L in L2([a, b]). The domain of L is 𝒟(L) = {f ∈ L2 ([a, b]) : f, f′ are absolutely continous and f″ ∈ L2 ([a, b])}, and for any f ∈ 𝒟(L),
Then L is a closed but unbounded operator and 𝒟(L) is dense in L2([a, b]) (for the definition of closed operators, see Chapter VIII of Riesz and Sz.-Nagy [13] or Chapter 13 of Rudin [14]). Let L* be the adjoint operator of L. By the theorem in Section 118 of Riesz and Sz.-Nagy [13] or Theorem 13.13 in Rudin [14], (I + αL*L)−1 is a bounded, positive self-adjoint operator with norm less than or equal to 1, where α ≥ 0 is the smoothing parameter. Now it follows from Theorem 12.33 and 13.31 in Rudin [14] that (I + αL*L)−1 has a unique positive and self-adjoint square root Sα with norm less than or equal to 1 which is the “half-smoothing” operator in Silverman [15]. Therefore,
(2.5) |
and by Theorem 13.11 (b) in Rudin [14], the inverse exists and is self-adjoint because (I + αL*L)−1 is invertible.
3. Silverman's approach to smoothed functional PCA
In this section, we always assume that the independent sample curves
are entirely observed. We first consider the usual population functional principal components. The first population functional principal component is defined as the linear functional ℓ1(X) of X which maximizes
over all nonzero linear functionals ℓ in L2([a, b]) with the norm ∥ℓ∥ = 1. The second population functional principal component is defined as the linear functional ℓ2(X) of X which maximizes
over all linear functional ℓ with the norm ‖ℓ‖ = 1 and uncorrelated with ℓ1(X). Similarly, we can define all the other population functional principal components, ℓ3(X), …. Because X takes values in L2([a, b]) which is a real Hilbert space, by the Riesz representation theorem, for any bounded linear functional ℓ, there is a unique γ ∈ L2([a, b]), such that for any f ∈ L2([a, b]),
Hence there exists γj, ∈ L2([a, b]), j ∈ ℕ, with ‖γj‖ = 1, such that the population functional principal components ℓj(X) = (γj, X),j ∈ ℕ. γj is called the j-th principal component weight function or j-th principal component curve. Because
γ1 is the solution of the following optimization problem,
(3.1) |
The maximum value of (3.1) is just the largest eigenvalue Ai of λ1 and γ1 is the corresponding eigenfunction (see Section 2, Chapter 3 in Weinberger [16]). γ2 is the solution of the optimization problem,
(3.2) |
The maximum value of (3.2) is just the second eigenvalue λ2 of Γ and γ2 is the corresponding eigenfunction. Similarly, γj is the eigenfunction corresponding to the eigenvalue λj which is also the variance of the j-th principal component.
Because the covariance function Γ is usually unknown, we can not obtain the population principal component weight functions directly. Hence, people use the sample covariance function Γ̂n to estimate Γ and use the eigenvalues and eigenfunctions of Γ̂n to estimate the eigenvalues and eigenfunctions of Γ. We call them non-smooth estimators. However, the non-smooth principal component curves can show substantial variability (see Chapter 9 in Ramsay and Silverman [12]). There is a need for smoothing of the estimated principal component weight functions.
Silverman [15] (see also Chapter 9 in Ramsay and Silverman [12]) proposed a method of incorporating smoothing by replacing the usual L2 norm with a norm that takes the roughness of the functions into account. Let α be a nonnegative smoothing parameter. Define the estimators of {(λj, γj) : j ∈ ℕ} to be the solutions of the following successive optimization problems. First, is the solution of the optimization problem
(3.3) |
Let be the maximum value of (3.3). For any k ∈ ℕ, if we have obtained and , is the solution of the optimization problem
(3.4) |
and is the maximum value of (3.4). Note that depends on both the sample size n and the smoothing parameter α.
First of all, we need to show that the solutions of the successive optimization problems (3.3) and (3.4) exist.
Theorem 3.1
Under Assumption 1, the solutions of the successive optimization problems (3.3) and (3.4) exist for any α ≥ 0 almost surely. Moreover, we have, for any and j ∈ ℕ,
(3.5) |
Similarly, define to be the solutions of the successive optimization problems (3.3) and (3.4) with Γ̂n replaced by Γ Similarly, we have the following equalities for Γ and
(3.6) |
Note that
Theorem 1 in Silverman [15] gives the consistency of the estimators
as α → 0 and n → ∞.
4. Asymptotic theory
Fix a positive integer K. We will assume throughout this section that we want to estimates the first K principal component curves. For any 1 ≤ k ≤ K, define
Then under Assumption 3, Lk is finite and is a measure of roughness of the first k eigenfunctions of Γ. For standard Brownian motion and Poisson process with rate 1 (see remark (3) after Assumption 3),
For any 1 ≤ k ≤ K, we have decompositions
(4.1) |
(4.2) |
The last terms on the right hand sides of both (4.1) and (4.2) are nonrandom. They are the “bias terms” due to the introduction of α. We will give the upper bounds for norms of these terms. The first terms on the right hand sides of both (4.1) and (4.2) are the “variation terms” due to the randomness of the sample curves. We will prove a functional central limit theorem for these terms. In order to avoid any confusion it should be pointed out that (4.1) and (4.2) are not the bias-variance decompositions in the strict sense because and are not the expectations of and respectively. Since it is hard to express or characterize the exact expectations of and , the asymptotic properties of the usual bias and variation terms in the strict sense may not be easily studied. Heuristic calculations of the usual bias and variation terms in the strict sense were performed in Section 6 of Silverman [15].
Note that even if the multiplicity of λk is one, we can not uniquely determine γk because −γk is also an eigenfunction. In the following theorem, by “Given γk”, we mean that not only γk is an eigenfunction, but also the direction of γk is given.
Define
(4.3) |
Theorem 4.1
Under Assumptions 1 – 3, for any 1 ≤ k ≤ K and 0 ≤ α ≤ α0,
(4.4) |
Given γk, 1 ≤ k ≤ K, we can uniquely choose for each α ∈ [0, α0] such that is a continuous function of α and for all 0 ≤ α ≤ α0, and we have
(4.5) |
Remark
- If K is fixed or hounded, we have
Hence, the convergence rates for eigenvalues and eigenfunctions are different. Eigenvalues have faster convergence rates than eigenfunctions. As K → ∞, we have α0 →. If we choose α in such a way that 0 ≤ α ≤ α0 and the right hand sides of (4.4) and (4.5) converges to zero, then and for all 1 ≤ k ≤ K.
The convergence rates for both eigenvalues and eigenfunctions depend on Lk. If the eigenfunctions are less smooth, that is, Lk is large, then the convergence is slow.
- (4.4) and (4.5) give the upper bounds. However, the lower bounds are 0 for any k ∈ ℕ. Here is a simple example. Without loss of generality, let k = 2. Suppose [a, b] = [0, 2π],
Note that the right hand side in the above equality converges both uniformly and in L2([0, 2π] × [0, 2π]) to a strictly positive definite covariance functions. Its first eigenvalue and eigenfunction are 2 and , the second ones are 1 and . It is interesting to note that the eigenfunctions of Γ are the same as the solutions of the successive optimization problems (3.3) and (3.4). The first maximum value of the successive optimization problems (3.3) and (3.4) is and the second one is still 1. That is, in this case, we have and for any α, hence the lower bounds are zeros.
Define Cℝ[0, α0] to be the normed space of all continuous real functions in [0, α0] equipped with norm sup0≤α≤α0. | · |. Let Π1≤j≤K Cℝ[0, α0] denote the product space of K copies of Cℝ[0, α0] Define CL2([a,b]) [0, α0] to be the normed space of all continuous functions in [0, α0] taking values in L2([a,b]) equipped with norm sup0≤α≤α0 ‖ · ‖. Similarly, we define Π1≤j≤K CL2([a,b])[0, α0].
For each 1 ≤ k ≤ K and each n, we will view as a stochastic process with index α ∈ [0, α0] and values in L2[a, b] and view as a stochastic process with index α ∈ [0, α0] and values in ℝ. However, in the following subset in the probability space,
(4.6) |
are not uniquely determined up to signs. We will show that Ω0 is measurable and its probability goes to zero as n → ∞ in the proof of the following theorem. Hence, how to define in Ω0 does not affect our asymptotic results. In order to make the development of our theory easier, we will use the following definition
(4.7) |
Theorem 4.2
Under Assumptions 1 — 3 and the definition (4.7), we can properly choose in to make the sequence
(4.8) |
of stochastic processes is measurable and has sample paths in
a.s. . Furthermore, the sequence converges in distribution to a Gaussian random, element with values in and mean zero. Similarly, the sequence
(4.9) |
of stochastic processes has sample paths in a.s. and converges in distribution to a Gaussian random, element with values in and mean zero.
Remark
Recall the definition of Guassian random elements in a separable Banach space. Suppose that X is a random element with values in a Banach space B with mean zero. Then X is a Guassian element if for any bounded linear functional f, f(X) is a Guassian random, variable. If X is a Guassian random, element, we can define its covariance operator Q. Q is a bounded operator from the dual space B′ to B such that for any f, g ∈ B′, g(Qf) = E [f(X)g(X)]. Note that the distribution of a Gaussian element with values in a Banach space and mean zero is determined by its covariance operator. For further properties of Guassian random elements in Banach spaces, see Ledoux and Talagrand [10].
The covariance operators (4.8) and (4.9) can be characterized by the “half-smoothing” operator Sα defined in (2.5) and the limit distribution of . However, the characterization involves some technical definitions. The reader can find the characterization in the proof of this theorem.
The measurabilities and a.s. continuities of the sample paths of the processes (4.8) and (4.9) are not obvious at all.
The convergences of (4.8) and (4.9) are weak convergences of probability measures in spaces and , which are stronger than the convergences of only the marginal distributions of (4.8) and (4.9).
Now from Theorem 4.1 and Theorem 4.2, we have the following corollaries.
Corollary 4.1
Under Assumptions 1 – 3, for any 1 ≤ k ≤ K and 0 ≤ α ≤ α0
(4.10) |
where
Remark
From, Corollary 4.1, it seems that smoothing (that is, α > 0) is unnecessary since when α = 0, we get the best order . We clarify this problem by the following remarks.
- Both Silverman [15] and this paper consider the ideal situation where every sample curve is observed at all points in [a, b] without any noise or measurement error. Although in this situation the estimates are consistent when α = 0, smoothing is advantageous.
- – First, because the “bias terms” and the “variation terms” are not the bias and the variation in the strict sense, they are correlated. Since the upper bounds on the right hand sides of (4.10) are the sums of the upper bounds for bias terms and variation terms, the upper bounds in (4.10) are actually for the cases in which bias terms and variation terms are positively correlated. They are the worst cases when we introduce smoothing. In some cases such as those in Section 6.3 of Silverman [15], the mean squared errors for some α > 0 are less than those for α = 0. For these cases, it is possible that bias terms and variation terms are negatively correlated and hence the estimate errors should be much less than the upper bounds in (4.10). Section 6.4 of Silverman [15] gave an optimal a with order for estimates of eigenfunctions. By Corollary 4.1, if we choose the optimal a, we obtain the best asymptotic rates . Even for the worst cases, if we take , we can obtain the rate .
- – Second, from a practical viewpoint, it is desirable that the estimates of principal component curves can keep main patterns of the true principal component curves. However, the sample curves of many stochastic process are nonsmooth or even discontinuous, such as examples in Remark (3) after Assumption 3. Hence, their sample covariance functions have many local variations and so do the eigenfunctions of those sample covariance functions. In these cases, the local variations can be removed by using an appropriate amount of smoothing, that is, choosing an appropriate positive α.
In practice, people cannot observe the entire sample curves. The observations can only be made at discrete points often with noise or measurement error. The observation points could he dense or sparse. If the sample curves are smooth and the observation points are dense, we can obtain smoothed estiamte of each sample function and perform the usual functional PC A. This method cannot be applied to other situations. However, Silverman's method can be generalized to all these situations (see Qi and Zhao [11]). In our generalization, smoothing is essential and the smoothing parameters must be positive. The theoretical results in this paper has been applied to prove the consistency results in Qi and Zhao [11].
If α goes to 0 fast enough as n → ∞, we have the following asymptotic normalities.
Corollary 4.2
Under Assumptions 1 – 3, for any sequence {αn, n ≥ 1} with , the joint distributions of
converge to the same Gaussian distribution with mean zero. For any sequence {αn, n ≥ 1} with , the joint distributions of
converge to the same Gaussian distribution with mean zero.
Remark
Dauxois et al. [3] gave the asymptotic normalities of the eigenvalues and eigenfunctions of Γ̂n and characterized the covariance operators of the limit Gaussian random elements. Those results are special cases of Corollary 4.2 with all αn equal to zeros. Therefore, by Corollary 4.2, all the limit Guassian distributions in Corollary 4.2 are the same as those in Dauxois et al. [3].
5. Proofs
Proof of Theorem 3.1
By Remark (3) after Assumption 1, ∥Γ̂n∥ < ∞ a.s.. Fix a sample and α ≥ 0 such that ∥Γ̂n∥ < ∞. Consider the Hilbert space equipped with the inner product (·,·)α. For any f, , the functional (f, Γ̂ng) define a bilinear form in and
Hence, there is a unique bounded operator Ra in , such that for any f, ,
(see Section 84 in Riesz and Sz.-Nagy [13]). It is easy to see that Ra is symmetric and nonnegative-definite. We want to show that Ra is a compact operator (note that a compact operator is called completely continuous operator in Riesz and Sz.-Nagy [13]). By definition 4 in Section 85 of Riesz and Sz.-Nagy [13], we only need to show that for any bounded sequence , one can select a subsequence {fmk} such that
(5.1) |
as k, l → ∞. Because Γ̂n is a compact operator in L2([a,b]) (see Remark (2) after Assumption 1) and {fm} is also a bounded sequence in L2([a, b]), one can select a subsequence {fmk} such that {Γ̂nfmmk} converges, then (5.1) is true for {fmk}. Hence Ra is a compact operator. It has eigenvalues and eigenfunctions with . They are the solutions of the successive optimization problems (3.3) and (3.4) (see Chapter 3 of Weinberger [16]). Now for any and any j ∈ ℕ, because
we have
Proof of Theorem 4.1
The proof of the existence and uniqueness of the choices of the signs of , 1 ≤ k ≤ K making them continuous functions of α will be postponed to the proof of Theorem 4.2 because we need some technical lemmas in the proof of Theorem 4.2. We will assume that we can choose the signs of , 1 ≤ k ≤ K such that they are continuous function of α for all 0 ≤ α ≤ α0 and , 1 ≤ k ≤ K.
For any 1 ≤ k ≤ K, let Pk be the orthogonal projection operator in L2([a,b]) onto the space spanned by {γ1,… ,γk} and I be the identity operator in L2([a, b]). Then (I − Pk) is the orthogonal projection operator onto the closed subspace spanned by {γj,j ≥ (k + 1)}.
Lemma 1
For any k ∈ ℕ, and α1 ≥ α2 ≥ 0
Proof. It follows Theorem 8.1 in Chapter 3 of Weinberger [16].
Lemma 2
For any 1 ≤ k ≤ K and α ≥ 0, we have
(5.2) |
Proof. For any j < k, by (3.6), we have
So
By Assumption 2 and Lemma 1, . Therefore,
and we have
where the last inequality in the second line follows from Cauchy-Schwarz in-equality.
Lemma 3
For any 1 ≤ k ≤ K and any
(if k = 1, the right hand side is defined to be infinity), we have
(5.3) |
Furthermore, if
(if k = 1, the right hand side is defined to he infinity), we have
(5.4) |
For any α ≥ 0, we have
(5.5) |
Hence, as α → 0, .
Proof. Let span(γ1, … , γk) denote the linear subspace spanned by
From Theorem 5.1 (Poincare's Principle) in Chapter 3 of Weinberger [16], we have
(5.6) |
where the equality in the third line of (5.6) is true because that (I − Pk−1) is the orthogonal projection operator onto the closed subspace spanned by {γj,j ≥ k} which is orthogonal to span(γ1, … , γk-1), and both of them are invariant subspaces of Γ. The last inequality in (5.6) holds because the largest eigenvalue of Γ restricted to the closed subspace spanned by {γj,j≥k} is λk and the L2 norm of is less than 1. On the other hand, we have
(5.7) |
The equality in the last line follows from the fact that the smallest eigenvalue of Γ in span(γ1, … , γk) is λk. The last inequality holds because that, for any β ∈ span(γ1, … , γk), let , where c1, … , ck are some real numbers, then we have
where the inequality in the second line is due to Cauchy-Schwarz inequality. Now from (5.6), (5.7) and Lemma 1, we have
From these inequalities, it can be derived that
Therefore, as α → 0.
Again by (5.6), (5.7), and note that , we have
Then
hence,
.
Now by (5.2), we have
After rearranging the terms, we then obtain
When the expression in braces on the left of the above inequality is positive, which is equivalent to
(if k = 1, the right hand side is denned to be infinity), we have
(5.8) |
When
(if k = 1, the right hand side is denned to be infinity), it can be shown that
and then it follows from (5.8) that
Lemma 4
For any 1 ≤ k ≤ K and any
(5.9) |
we have
(5.10) |
Proof. By the following orthogonal decomposition
(5.11) |
we have
(5.12) |
where the last inequality follows from the fact that belongs to the closed subspace spanned by {γj, j ≥ k + 1} in which the largest eigenvalue of Γ is λk+1. On the other hand, by (3.6), we have
(5.13) |
then
(5.14) |
It follows from (5.9) that . Then by (5.5), we have
hence,
(5.15) |
Because
we have
(5.16) |
From (5.14), (5.15) and (5.16),
Now by Lemma 2,
Now we can prove Theorem 4.1. It follows from the definition (4.3) of α0 that all the conditions in Lemmas 3 and 4 are satisfied. From the orthogonal decomposition
we have
Hence, it follows from Lemma 2, Lemma 4 and (5.4) in Lemma 3 that
(5.17) |
Define
(5.18) |
By solving the following inequalities,
we obtain . Since
By the definition (4.3) of α0 and (5.18), we have
Hence, for any 0 ≤ α ≤ α0, we have . Now it follows from (5.17) that, for any 0 ≤ α ≤ α0,
(5.19) |
Because is a continuous function of α, is also a continuous function of α and . Hence, it follows from (5.19) that for all 0 ≤ α ≤ α0.
From (5.16), (5.17) and (5.4), we have
By (5.17) and , we have
and thus
Proof of Theorem 4.2
We first study the properties of the “half-smoothing” operators Sα. At the end of Section 2, we know that Sα is a bounded linear operator from L2([a,b]) to L2([a,b]) with norm less than or equal to 1. Moreover, Sα is a one to one (injective) map. Hence, its inverse exists. When α = 0, S0 is just the identity operator I in L2([a,b]). The following lemma gives the reason why Sα is called “half-smoothing” operators.
Lemma 5
The range of Sα (or the domain of ) is . Moreover, for any ,
(5.20) |
Proof If α = 0, the results are trivial. Hence, we assume that α > 0. Since the space C∞[a, b] of smooth functions is dense in space
for any , there exists a sequence {fm ∈ C∞[a, b], m ∈ ℕ} such that ∥fm − f∥α → 0. One can see that the domain of contains C∞[a, b], hence C∞[a, b] is also in the domain of . Now we compute
(5.21) |
as m, l → ∞. Hence, is a Cauchy sequence in L2([a, b]). It converges to some function, say g, in L2([a, b]). Since Sα is a bounded operator, converges to Sαg in L2-norm. However, fm converges to f in ∥ · ∥α norm, it also converges in L2-norm. Therefore, Sαg = f, that is, f is in the range of Sα. Hence, is in the range of Sα. Because for any m ∈ ℕ, from a similar calculation as in (5.21),
and
we have .
Now we show that the range of Sα is equal to . Since we have shown that is in the range of Sα and Sα is a one-to-one map, we only need to show that the range of under is L2([a, b]). By (5.20) and the completeness of , the range of under is a closed subspace of L2([a, b]). If the range of under is not L2([a, b]), then we can find 0 ≠ h ∈ L2([a, b]) such that
Since one can see that the domain of is contained in , we have
Then
However, because the range of is the whole L2([a, b]), we have Sαh = 0. Hence h = 0 since Sα is a one-to-one map. We get a contradiction. Therefore, the range of Sα is equal to .
Lemma 6
and are eigenvalues and eigenfunctions of the compact operators SαΓ̂nSα and SαΓ̂Sα in L2([a, b]) respectively. Moreover, there are no other eigenvalues for SαΓ̂nSα and SαΓ̂Sα.
Note that the L2 norms of and may not be 1.
Proof. If α = 0, the results are trivial. Hence, we assume that α > 0. Because are solutions of the successive optimization problems (3.3) and (3.4), then by Lemma 5,
Hence, are the first eigenvalue and the corresponding eigenfunction of SαΓ̂nSα. Similarly, we can prove the conclusions for other eigenvalues and eigenfunctions.
Define
(5.22) |
For the definition and properties of compact operators in Banach spaces, we refer reader to Chapter 21 in Lax [9]. Define a sequence of stochastic processes
which is indexed by α and takes values in H because both Γ̂n and Γ are compact operators and Sα is a bounded operator. Note that . We follow the notations in Dauxois et al. [3]. Let F denote the space of Hilbert-Schmidt operators from L2([a, b]) to L2([a, b]). Then F is a Hilbert space with a inner product denoted by < ·, · >F. By Assumption 1,
Thus Γ̂n, Γ ∈ F. It follows from Proposition 5 in Dauxois et al. [3] that {Zn(0), n ∈ ℕ}, regarded as a sequence of random elements with values in F, converges in distribution to the Gaussian random element in F with mean 0 and covariance operator Q, where
(5.23) |
X ⊗ X denotes the bounded operator from L2([a, b]) to L2([a, b]) with (X ⊗ X) (γ) = (γ, X)X for any γ ∈ L2([a, b]). Γ⊗̃Γ denotes the bounded operator from F to F with (Γ⊗̃Γ)(Λ) = 〈Λ,Γ〉F Γ for any Λ ∈ F. The other terms in (5.23) are denned similarly. Note that according to the definition (5.23), Q is an operator from F to F. However, because F is a Hilbert space, there is an isometry between F and its dual space F′. Hence, Q can be regarded as a bounded operator from F′ to F and then it satisfies the definition of covariance operators in Remark (1) after Theorem 4.2. However, in this paper, we will consider the space H of compact operators which is larger than the space F of Hilbert-Schmidt operators (every Hilbert-Schmidt operator is compact). In the proof of Proposition 6 in Dauxois et al. [3], the authors used the fact that if A is a Hilbert-Schmidt operator, then (A − zI)−1 is also a Hilbert-Schmidt operator, where z is a complex which is not an eigenvalue of A and I is the identity operator. However, this is not true in general. But (A − zI)−1 is a bounded operator. Because the norm (2.1) in H is smaller than the norm in F, the embedding map i : F ↪ H (i maps any Hilbert-Schmidt operator to itself) is a bounded operator. Then we have
Lemma 7
{Zn(0), n ∈ ℕ}, regarded as a sequence of random, elements with values in H, converges in distribution to a Gaussian random element in H with mean zero and covariance operator iQi*, where i* is the adjoint operator of i and Q is defined in (5.23).
Proof. It follows immediately from the following lemma.
Lemma 8
Suppose that {Xn, n ≥ 1} is a sequence of random, elements with values in a Banach space B. If Xn converges in distribution to a Gaussian random element X with mean zero and covariance operator Λ. Let T be a bounded operator (that is, a continuous linear function) from B to another Banach space C. Then T(Xn) converges in distribution to T(X) which is also a Guassian random element with mean zero and covariance operator TΛT*, where T* is the adjoint operator of T.
Proof. Since T is a continuous map from B to C, by continuous mapping theorem, T(Xn) converges in distribution to T(X). Now we show that T(X) is an Guassian random element. For any bounded linear functional f ∈ C′, fοT ∈ B′. Hence, f(T(X)) = f ο T(X) is a Gaussian random variable since X is Gaussian. Thus T(X) is Guassian and obviously its mean is zero. In order to compute it covariance operator, we intruduce the following notations. For any x ∈ B, y ∈ C and f ∈ B′, g ∈ C′, define 〈x, f〉B = f(x), 〈y, g〉C = g(y). By the definition of covariance operators (see Remark (1) after Theorem 4.2) and the definition of adjoint operators, for any g, h ∈ C′,
Therefore, the covariance operator of TX is TΛT*.
Lemma 9
For any finite 0 ≤ α1 < … < αk ≤ α0, the sequence
converges in distribution to a Gaussian random element with values in Hk and mean zero, where Hk is the product space of k copies of H.
Proof. This lemma follows from Lemma 8 and the fact that
is a continuous and linear function of Zn(0) since Sα1, i = 1,… , k are bounded operators.
Unfortunately, Sα is not continuous as α → 0 under the norm (2.1). For example, let
By (5.20),
Define . Then ∥gn∥ = 1 and
Therefore, ∥Sα − I∥ ≥ 1 for all α. Note that S0 = I. However, we have the following results.
Lemma 10
For any f ∈ L2([a, b]), α → Sαf is a continuous map from [0, α0] to L2([a, b]).
Proof. Let E be the resolution of the identity for the self-adjoint operator Sα0 (for reference, see Chapter 12 of Rudin [14]). Because Sα0 is a positive operator with ∥Sα0∥ ≤ 1, Ef, f is a bounded positive Borel measure in [0, 1]. Fix α ∈ [0, α0].
Now define a family continuous functions on [0, 1],
then Sα = φα(Sα0). Let α′ ∈ [0, α0] and α′ → α. It follows from Theorem 12.21 and 12.23 in Chapter 12 of Rudin [14] that
The integrand on the right hand side is bounded. If α ≠ 0, the integrand converges to 0 at each point in [0, 1] as α′ → α. By the bounded convergence theorem, ∥(Sα′ − Sα)f∥2 → 0. If α = 0, the integrand converges to 0 at each point in [0, 1] except 0. If we can show that the measure value Ef, f({0}) of Ef, f on the set {0} is zero, then by the bounded convergence theorem, we still have ∥(Sα′ − Sα) f∥2 → 0. In fact, for any g ∈ L2([a, b]),
Hence, Sα0 E({0})f = 0. Because Sα0 is a one-to-one operator, E({0})f = 0. Therefore,
Lemma 11
For any compact operator Λ in L2([a, b]), α → SαΛSα is a continuous map from [0, α0] to H.
Proof. By Lemma 11 in Section XI.9 of Dunford and Schwartz [5], there exists a sequence Λm of bounded operators having finite-dimensional range, such that ∥Λm − Λ∥ → 0. If we can show that for each m, α → SαΛmSα is a continuous map, then since ∥SαΛmSα − SαΛSα∥ ≥ ∥Λm − Λ∥ → 0 uniformly, α → SαΛSα is continuous. Now fix m and 0 ≤ α ≤ α0. Let {e1, …, ek} be an orthonormal basis of the range of Λm and α′ → α. For any f ∈ L2 ([a, b]) with ∥f∥ ≤ 1,
Because
which converges to 0 uniformly for all f ∈ L2([a, b]) with ∥f∥ ≤ 1 by Lemma 10. Now
which converges to 0 uniformly for all f ∈ L2([a, b]) with ∥f∥ ≤ 1 by Lemma 10, where Λm* is the adjoint operator of Λm. Hence, ∥Sα′ΛmSα′ − SαΛmSα∥ → 0.
In the next lemma, we assume that all the eigenfunctions have norms 1.
Lemma 12
Suppose that α → Λ(α) is a continuous map from [0, α0] to the suhspace of positive compact operators in L2([a, b]) in H. Assume that the first K eigenvalues of Λ(α) for any α ∈ [0, α0] are positive and mutually different, and each of them has multiplicity 1. Then given the first K eigenfunctions of Λ(0), there exist unique choices of the first k eigenfunctions of Λ(α) for any α ∈ (0, α0] such that is a continuous map from [0, α0] to L2([a, b]) for any 1 ≤ k ≤ K.
Note that for each 1 ≤ k ≤ K and 0 ≤ α ≤ α0, there exist two eigenfunctions with norm 1 of Λ(α) corresponding its k-th eigenvalues and any one of the two eigenfunctions is equal to the other one multiplied by −1.
Proof. Let be the first K eigenvalues of Λ(α). Let Ek(α) be the orthogonal projection onto the space spanned by the , 1 ≤ k ≤ K, 0 ≤ α ≤ α0. Note Ek(α) does not depend on the sign of .
We first show that for any 1 ≤ k ≤ K, Ek(α) is a continuous function of from [0, α0] to H. For any fixed α ∈ [0, α0], we can find a small positive number εα such that the K + 1 intervals
are disjoint. Since Λ(α) is a continuous function, we can choose a neighborhood ℳα of α in [0, α0], such that for any α′ ∈ ℳα
where the first inequality follows from Corollary 4 in Section XI.9 of Dunford and Schwartz [5]. Now we define K circles on the complex plane ℂ,
Then one can see that for any α′ ∈ ℳα, the disk bounded by the circle Ck only contains the k-th eigenvalues of Λ(α′). Hence, we have (see Section VII.3 of Dunford and Schwartz [4] or Definition 10.26 in Rudin [14])
for any α′ ∈ ℳα. Since (zI − Λ(α′))−1 is a continuous function of z ∈ Ck Ck is a compact set, we have
(5.24) |
Since Λ(α) is a continuous function of α, for any 0 < δ < 1, we can find a neighborhood 𝒩α of α such that
(5.25) |
Now for any α′ ∈ ℳα ⋂ 𝒩α,
(5.26) |
Since δ can be arbitrarily small, Ek(α) is continuous at α.
Now we show that for any given α ∈ [0, α0], and given , there exists a neighborhood [α1, α2] of α such that for any α′ ∈ [α1, α2], we can uniquely choose such that is continuous in this neighborhood. Because Ek(α′) is a continuous function of α′, is a continuous function of α′ and its value is 1 at α′ = α. Hence, we can find a neighborhood [α1, α2] of α such that for α′ ∈ [α1, α2]. Then
are eigenfunctions and continuous in [α1, α2]. Now we show the uniqueness. Suppose , α′ ∈ [α1, α2] is another choice of the eigenfunctions such that it is continuous and . If for some , , we have . Since both the inner products and are continuous functions for α′ ∈ [α1, α2]. By the choice of [α1, α2], . Because , one of them must be negative. Without loss of generality, we assume that . Since , it follows from the intermediate value theorem that there is at least one point α‴ between α and α″ such that . However, it is impossible because
Hence we have proved the uniqueness.
Fix . Let the set
By the arguments in the last paragraph, 𝒱 is nonempty. Now we show that the set 𝒱 is an open set. Suppose that α* is any point in 𝒱. It follows from the last paragraph that there exists a neighborhood [α1, α2] of α* such that given e[α*], we can uniquely choose the sign of e[α] for any α ∈ [α1, α2] to make e[α], α ∈ [α1, α2] a continuous function. We show that [α1, α2] ⊂ 𝒱. Let α** be any point in [α1, α2]. It is easy to see that we can choose the signs of e[α] for all α ∈ [0, α**] such that e[α] is a continuous function of α in [0, α**]. We only need to show the uniqueness of e[α]. The uniqueness is obvious if α** ≥ α* since α* ∈ 𝒱. Hence we assume that α** < α*. We will proceed by contradiction. Assume that there are two different continuous functions and , 0 ≤ α ≤ α**. By the definition of [α1, α2], we can choose a continuous function , α** ≤ α ≤ α*. Define
and
Then and are two different continuous functions in [0, α*], which contradicts to α* ∈ 𝒱. Hence, 𝒱 is an open set.
Now if we can prove that 𝒱 is also a closed set, we have 𝒱= [0, α0]. Let αm ∈ 𝒱 be a sequence of positive numbers converging to α ∈ [0, α0]. If for some m, αm ≥ α it is obvious that α ∈ 𝒱. Hence we assume that αm < α for all m. Then we can uniquely choose the signs of of such that is continuous in [0, α). Let be one of the two eigenfunctions with norm 1. Because for any α′ < α
goes to zero as α′ → α, . Since continuous in [0, α). converges either to 1 or −1. In the latter case, we change to . Hence, without loss of generality, we assume that as α′ → α. Now one can see that is continuous on [0, α] and its uniqueness is obvious. Hence, α ∈ 𝒱. We have proven that 𝒱 is a close set.
Define CH [0, α0] to be the space of all the continuous function from [0, α0] → H (see Chapter 3 of Billingsley [1]). For any {Λ(α) : 0 ≤ α ≤ α0} ∈ CH[0, α0], define a norm
(5.27) |
Under the norm (5.27), CH[0, α0] is a Banach space. Recall the definition
By Lemma 11, we can regard the stochastic processes Zn in [0, α] as random elements with values in CH [0, α0]. Define a linear map Θ: H → CH[0, α0] such that for any compact operator U ∈ H,
(5.28) |
Lemma 13
Θ is a bounded operator and the sequence {Zn, n ∈ ℕ} of stochastic processes with sample paths in CH[0, α0] converges in distribution to the Gaussian random, element with mean zero and covariance operator ΘiQi*Θ*.
Proof. Since the norm of Sα is less than or equal to 1, for any V ∈ H,
Hence, the map (5.28) is continuous and hence a bounded operator. Since Zn = Θ(Zn(0)), the lemma follows from Lemmas 7 and 8.
Now for any 1 ≤ k ≤ K, define
(5.29) |
Note that by Lemma 6, and are the eigenfunctions of SαΓ̂Sα and SαΓSα with norms 1. By (5.29) and because and , we have
(5.30) |
and
(5.31) |
Define , 1 ≤ k ≤ K, and εK = min1≤k≤K ε̃k. Then the K + 1 intervals
(5.32) |
are disjoint. By the definition (4.3) of α0 and (5.5) in Lemma 3, for any 0 ≥ α ≥ α0 and 1 ≥ k ≥ K,
(5.33) |
Hence, are different mutually for all 0 ≥ α ≥ α0. Now given γk, 1 ≤ k ≤ K, by Lemma 11 and Lemma 12, we can uniquely choose the first K eigenfunctions of SαΓSα such that and , 1 ≤ k ≤ K, are continuous functions of α. We have proved the claims about the continuity of , 1 ≤ k ≤ K at the beginning of the proof of Theorem 4.1.
Now we define K circles in the complex plane ℂ,
(5.34) |
Note that the K discs bounded by Ck, 1 ≤ k ≤ K are disjoint and the intersections between these discs and the real line in the complex plane are just the first K intervals in (5.32). Let Ek(α) be the orthogonal projection onto the space spanned by the , 1 ≤ k ≤ K, 0 ≤ α ≤ α0. Now because it follows from (5.33) that for any 0 ≤ α ≤ a0, 1 ≤ k ≤ K, the disk bounded by the circle Ck only contains the k-th eigenvalues of SαΓSα for any 0 ≤ α ≤ a0, 1 ≤ k ≤ K, we have
(5.35) |
By Lemma 11, SaΓSα is a continuous function of α. Hence, by a similar calculation as in (5.26), it can be shown that Ek(α) is a continuous function of α.
Recall that we define in (4.6)
Lemma 14
Ω0 is a measurable set and P(Ω0) → 0 as n → ∞.
Proof. Consider the subset
ε is an open subset of the space of all positive compact operators which is closed in H, hence it is measurable. Let (Ω, ℱ) be the probability space and ([0, α0], ℬ[0, α0]) be the Lebesgue space. Since SαΓ̂Sα has continuous sample paths, it is jointly measurable in (Ω × [0, α0], ℱ × ℬ[0, α0]). One can see that is the projection of the set {(ω, α) : SαΓ̂nSα ∈ ε} to Ω. Therefore, is measurable, so is Ω0. By (5.33) and the definition of εK (just above (5.32)), we have
By Corollary 4 in Section XI.9 of Dunford and Schwartz [5],
(5.36) |
Hence,
(5.37) |
by the law of large numbers.
For any ω ∈ Ω0, define to be zero. For any ω ∉ Ω0, define to be the orthogonal projection onto the space spanned by the k-th eigenfunction of SαΓ̂nSα (note that does not depend on the sign of ). By the same argument as in the proof of Lemma 12, we can show that is a continuous function of SαΓ̂nSα, so it is measurable and continuous in α. Now let {em, m ∈ ℕ} be a set of complete orthonormal basis functions in L2([a, b]), we choose
(5.38) |
in and 0 in Ω0, where χ is the indicator function. Then is measurable and
(5.39) |
Now by Lemma 11, Lemma 12 and the definition of Ω0, for any ω ∉ Ω0, we can uniquely choose , 1 ≤ k ≤ K, such that , 1 ≤ k ≤ K are continuous functions of α. is measurable by the following lemma. By (5.31), , 1 ≤ k ≤ K are continuous and measurable with .
Lemma 15
If for any 1 ≤ k ≤ K, is a measurable map to CL2([a,b][0, α0].
Proof. In , is a continuous function of α. Since , let in . In Ω0, define T̂(1) = 0. Then T̂(1) is a nonnegative random variable. By Lemma 12, we have in , if α ≤ T̂(1),
Define a random element
in and 0 in Ω0. Define a random variable and a random element
in and 0 in Ω0. Similarly, we can define (T̂(3), ζ3), …. One can show that for any ω ∈ Ωc, there are only finite T̂(m)(ω) < α0, m =0, 1, 2, …, where T̂(0)(ω). Hence in Ωc, we have
where and χ is the indicator function. Hence, is measurable.
By (5.33) and (5.36), in the event , for any 0 ≤ α ≤ α0, 1 ≤ k ≤ K, the disk bounded by the circle Ck only contains the k-th eigenvalues for SαΓ̂Sα and SαΓSα. Hence, in the event , for any 0 ≤ α ≤ α0, 1 ≤ k ≤ K, we have
(5.40) |
The proofs of the following Lemma 16 and Lemma 17 follows the ideas of Section 2 in Dauxois et al. [3]. Define linear maps ϕk : CH[0, α0] → CH[0, α0], 1 ≤ k ≤ K such that for any Λ ∈ CH[0, α0 and 0 ≤ α ≤ α0,
(5.41) |
where (ϕk(Λ))(α) denotes the value of ϕk(Λ) at the point α. Then define ΦK = (ϕ1, ϕ2, …, ϕK) which is a linear map from CH[0, α0] to . One can verify that ϕk's are continuous. Hence ΦK is a bounded operator.
Lemma 16
The sequence of stochastic processes has sample paths in a.s. and converges in distribution to a Gaussian random, element with mean zero and covariance operator .
Proof. In the event , for each z ∈ CK,
(5.42) |
If
where
then by (5.42), we have an absolutely convergent series expansion
Hence,
(5.43) |
where
Hence, in the event ,
(5.44) |
Now in the event , by (5.42) and (5.43),
(5.45) |
Now we have from (5.44) and (5.45), for any δ > 0,
(5.46) |
as n → 0. By Lemmas 8 and 13, ΦK(Zn) = (ϕ1(Zn), ϕ2(Zn), … ϕK(Zn)) converges in distribution to the Gaussian element with mean zero and covariance operator . Now by (5.46), converges in distribution to the same distribution.
Define linear maps Ψk : CH[0, α0] → CL2([a,b) [0, α0], 1 ≤ k ≤ K such that for any Λ ∈ CH[0, α0],
(5.47) |
Then we define a linear map such that for any ,
(5.48) |
It is easy to see that ψK is a bounded operator.
Lemma 17
The sequence of stochastic processes has sample paths in a.s. and converges in distribution to a Gaussian random element with mean zero and covariance operator .
Proof. By the definitions (5.29) of . In , we have
By (5.46), and ϕk(Zn) have the same limit distribution. Because for any Λ ∈ CH[0, α0],
(5.49) |
where we use the facts that
So we have
in probability. By (5.39) and the continuities of and , we have
(5.50) |
in probability. Now
(5.51) |
By (5.50), the first term in the last line converges to 0 in probability and in probability. Hence, has the same limit distribution as which converges to a Gaussian random element with mean zero and covariance operator by Lemmas 8 and 16.
Define linear maps ϕk : CH[0, α0 → CH[0, α0], 1 ≤ k ≤ K, such that for any Λ ∈ CH[0, α0],
where Ψk is defined in (5.47) and (Ψk(Λ))(α) denotes the value of Ψk(Λ) at α. Define a liner map such that for any (Λ, …, ΛK),
(5.52) |
It is easy to see that ℧K is a bounded operator.
Lemma 18
The sequence of stochastic processes has sample paths in and converges in distribution to a Gaussian random, element with zero and covariance operator .
Proof. The continuities of and follow from Lemma 11 and the inequalities
for any 0 ≤ α, α′ ≤ α0. In ,
(5.53) |
By Lemmas 16 and 17, and in probability. Hence by (5.53), has the same limit distribution as
which, by (5.51), has the same distribution as
Hence, has the same limit distribution as which converges to a Gaussian random element with mean zero and covariance operator by Lemmas 8 and 16.
Define a linear map such that for any ,
(5.54) |
ℑK is a bounded operator.
Lemma 19
The sequence of stochastic processes has sample paths in a.s. and converges in distribution to a Gaussian random element with mean zero and covariance operator .
Proof. By (5.31),
Therefore,
(5.55) |
Because
in probability, by the definition (5.54) of ℑK, (5.55) and Lemma 17, has the same limit distribution as which converges to a Gaussian random element with mean zero and covariance operator .
Proof of Corollary 4.1
By Lemma 18 and Lemma 19, the stochastic processes and convergence in distribution, hence they are tight by Theorem 5.2 in Billinsley [1] since CH[0, α0] and CL2[a,b][0, α0] are both complete and separable. Therefore, for any ∈ > 0, one can find a positive number M depending on ∈ such that
In other words,
uniformly in α, which combines Theorem 4.1 to get our corollary.
Proof of Corollary 4.2
First, we have decompositions
Under the conditions on αn for eigenvalues and eigenfunctions respectively, by Theorem 4.1, we have and respectively. Since and converge in distribution by Theorem 4.2, they are tight. Hence, the asymptotic normalities of and follows from Theorem 4.2 and the following lemma. Then the corollary follows at once.
Lemma 20
Suppose that F is a metric space with distance d. Let CF[0, α0] denote the continuous function on [0, α0] taking values in F. Suppose we have a sequence {Yn(α), 0 ≤ α ≤ α0, n ∈ ℕ} of stochastic processes has sample paths in CF[0, α0]. Assume that Yn is tight and Yn(0) converges in distribution to a random element Y in F, then for any sequence αn of positive numbers converging to 0, Yn(αn) also converges in distribution to Y.
Proof. First, we show that for any ∈ > 0 we can find δ > 0 such that
Since Yn is tight, we can find a compact subset Χ of CF[0, α0] such that
We can find a finite number of Λ1, … Λm ∈ Χ such that for any Λ ∈ Χ, we can find i such that . Furthermore, we can find δ > 0 such that,
Now it is easy to see that for any Λ ∈ Χ,
Hence,
If αn ≤ δ, we have
Since ∈ is arbitrary, d(Yn(0), Yn(αn)) → 0 in probability.
Acknowledgments
☆ Supported in part by NIH grants R01 GM59507, a pilot project from the Yale Pepper Center, and NSF grant DMS 0714817.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Billingsley P. Convergence of Probability Measures. 2nd Edition Wiley-Interscience; 1999. [Google Scholar]
- 2.Cardot H, Ferraty F, Sarda P. Functional linear model. Statistics and Probability Letters. 1999;45:11–22. [Google Scholar]
- 3.Dauxois J, Pousse A, Romain Y. Asymptotic theory for the principal component analysis ol a random vector lunction: some application to statistical inference. J. Multivariate Anal. 1982;12:136–154. [Google Scholar]
- 4.Dunford N, Schwartz JT. Linear Operators, General Theory, Part 1. Wiley-Interscience; 1988. [Google Scholar]
- 5.Dunford N, Schwartz JT. Linear Operators, Spectral Theory, Self Adjoint Operators in Hilbert Space, Part 2. Wiley-Interscience; 1988. [Google Scholar]
- 6.Ferraty F, Vieu P. Nonparametric Functional Data Analysis: Theory and Practice. Springer; 2006. [Google Scholar]
- 7.Glasserman P. Monte Carlo Methods in Financial Engineering. 1 edition Springer; 2003. [Google Scholar]
- 8.Huang JZ, Shen H, Buja A. Functional principal components analysis via penalized rank one approximation. Electron. J. Statist. 2008;2:678–695. [Google Scholar]
- 9.Lax PD. Functional Analysis (Pure and Applied Mathematics: A Wiley-Interscience Series of Texts, Monographs and Tracts) Wiley-Interscience; 2002. [Google Scholar]
- 10.Ledoux M, Talagrand M. Probability in Banach Spaces: Isoperimetry and Processes. A Series of Modern Surveys in Mathematics, Springer; 2006. Ergebnisse der Mathematik und ihrer Gren-zgebiete. 3.Folge, Band 23. [Google Scholar]
- 11.Qi X, Zhao H. Functional principal component analysis for discretely observed functional data. 2010 Submitted. [Google Scholar]
- 12.Ramsay JO, Silverman BW. Functional Data Analysis. 2nd Edition Springer; New York: 2005. [Google Scholar]
- 13.Riesz F, Sz.-Nagy B. Functional Analysis. Dover Publications; 1990. [Google Scholar]
- 14.Rudin W. Functional Analysis. McGraw-Hill Science/Engineering/Math; 1991. 2nd Edition. [Google Scholar]
- 15.Silverman BW. Smoothed functional principal components analysis by choice of norm. The Annals of Statistics. 1996;24:1–24. [Google Scholar]
- 16.Weinberger HF. Variational Methods for Eigenvalue Approximation, 2nd Edition (CBMS-NSF Regional Conference Series in Applied Mathematics) Society for Industrial Mathematics; 1987. [Google Scholar]