Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jun 21.
Published in final edited form as: Comput Stat. 2014 Jun 4;29(6):1497–1513. doi: 10.1007/s00180-014-0503-4

Functional Data Classification: A Wavelet Approach

Chung Chang 1,*, R Todd Ogden 2, Yakuan Chen 2
PMCID: PMC11192549  NIHMSID: NIHMS610810  PMID: 38912384

Abstract

In recent years, several methods have been proposed to deal with functional data classification problems (e.g., one-dimensional curves or two- or three-dimensional images). One popular general approach is based on the kernel-based method, proposed by Ferraty and Vieu (2003). The performance of this general method depends heavily on the choice of the semi-metric. Motivated by Fan and Lin (1998) and our image data, we propose a new semi-metric, based on wavelet thresholding for classifying functional data. This wavelet-thresholding semi-metric is able to adapt to the smoothness of the data and provides for particularly good classification when data features are localized and/or sparse. We conduct simulation studies to compare our proposed method with several functional classification methods and study the relative performance of the methods for classifying positron emission tomography (PET) images.

Keywords: wavelet thresholding, semi-metric

1 Introduction

The classical problem in classification analysis (discriminant analysis) solves, based on a training data set consisting of multivariate data and class membership for several observations, correct classification of a new observation. In recent years, data with complicated structure and high dimensionality have attracted much attention. In particular, in many situations, the multivariate observations can be regarded as functional data, such as one-dimensional curves or two- or three-dimensional images (see Ramsay and Silverman, 2005; Ferraty and Vieu, 2006). The ultimate aim of functional data classification is to determine group membership for a newly observed function based on a training sample consisting of observed functions with their corresponding class memberships.

Curve classification was described by Hastie et al. (1995), who dealt with phonemes. Our motivating example arises from a study of depression using positron emission tomography (PET). Binding potential (BP; Gunn et al., 2001) of the serotonin 1A receptor was estimated throughout the brain for each of many subjects drawn from two groups: patients with major depressive disorder and normal controls. Figure 1 shows BP image for three normal controls (top row) and for three subjects with major depressive disorder (bottom row).

Figure 1.

Figure 1

The 50th transaxial slices of the binding potential images in PET for normal control subjects (top) and patients with major depressive disorder (bottom).

Due to the high dimensionality of such functional data, traditional classification methods for multivariate data are not generally appropriate. Many methods dealing with functional data classification have been developed rapidly in recent years. Here we review contributions most closely related to our proposed method. Hastie et al. (1995) set out the general idea of Functional Discriminant Analysis. Hall et al. (2001) proposed a functional data-analytic method for dimension reduction by regarding signals as curves or functions and performed Quadratic Discriminant Analysis (FQDA) on the reduced space. Cuevas et al. (2007) used the idea of depth to compute robust distances between curves. Berlinet et al. (2008) developed a supervised wavelet-based functional data classification method. Cao and Fan (2009) proposed a kernel-induced random forest method for classifying the functional data by defining kernel functions of two curves. Other related functional classification methods include functional generalized linear models (FGLM, Müller and Stadmüller 2005), functional kernel density estimation (FKDE, Zhu et. al.), and functional principle component regression (FPCR, Reiss and Ogden 2007) .

Our approach for functional data classification is based on the kernel-based non-parametric approach proposed by Ferraty and Vieu (2003) and requires the choice of a “distance function” (more precisely, a semi-metric). The performance of this approach is greatly affected by the choice of the semi-metric. Ferraty and Vieu's proposed semi-metric is based on functional principal component analysis (FPCA). In this paper we propose an alternative semi-metric based on wavelets and through simulations and real data analysis, we examine whether our proposed semi-metric will allow for improved performance in some situations, such as the noisy image data shown in Figure 1.

In Section 2, we will first review Ferraty and Vieu's kernel approach and describe our motivation for choosing the wavelet-based semi-metric for functional classification. In addition, we will briefly introduce wavelet methods and then describe our approach. We provide simulation results for comparing our proposed wavelet method with other functional classification approaches and a real image data application in Section 4. Some brief discussion is given in Section 5.

2 Methodology

The central idea of the kernel-based approach proposed by Ferraty and Vieu (2003) is to compute distances between a given curve to be classified and all curves in the training data. The classification of the new curve is based on class membership of the curves that are “nearest” to it. In order to compute the distance between curves, some metric between curves must be defined. In fact, only the properties of symmetry, non-negativity, and triangle inequality of a metric are needed (coincidence axiom is not necessary) and therefore, only a semi-metric is required. To be precise, a semi-metric d on a space S is a function that maps S × S to R which satisfies the axioms of symmetry, non-negativity, and triangle inequality for a metric, but d(s1, s2) = 0 does not imply s1 = s2.

2.1 Notation

Before describing the method, we first introduce some notation. Let (X1, Y1), . . . , (Xn, Yn) be independently and identically distributed as (X, Y ), where the Xis are random functions that take values in the semi-metric space (S; d) and the Yis are categorical variables. To avoid abstract notation, in this article we assume S to be the Hilbert space L2([0, 1]p), where p is the dimension of the Xi functions. To further simplify notation, we will use C to represent the support [0, 1]p.

2.2 Review of the kernel method

Ferraty and Vieu (2003) proposed a kernel-based approach to classify curves (i.e., C = [0, 1]). The procedure is described as follows: For a given function x, for each group g, estimate the conditional probability that Y belongs to g:

pg(x)=P(Y=gX(t)=x(t),tC),g1,,G. (1)

Then assign the function x to the group with highest conditional probability.

In order to estimate the conditional probability in 1, Ferraty and Vieu proposed a kernel estimator, defined by

p^g,h(x)=i=1nI{Yi=g}K(h1d(x,Xi))i=1nK(h1d(x,Xi)), (2)

where I is the indicator function, h is the bandwidth, and K is the kernel function with support [0,1] that is non-negative and

01K(x)dx=1

. Note that in this paper, we use uniform kernel. We also tried Epanechnikov kernel and the results are very similar.

The semi-metric used in their paper is based on functional principal component analysis (FPCA; Dauxois et al. 1982), in which a random function X can be expressed as

X(t)=k=1(CX(t)vk(t)dt)vk(t),

where vk is the kth orthonormal eigenfunction (corresponding to the kth largest eigenvalues) for the covariance function Γ(s, t) = E(X(s)X(t)). The corresponding semi-metric dFPCA is defined

dFPCA2(x1,x2)=i=1L(C(x1(t)x2(t))vi(t)dt)2,

which depends on L, the number of selected functional principal components, which could be determined by cross-validation. To simplify notation, we suppress the dependence of dFPCA on L.

2.3 Wavelets

Wavelet bases are commonly used for sparse representation of curves or images. Compared with the traditional Fourier bases, wavelet bases provide a degree of localization in space as well as in frequency. A wavelet basis is constructed using a scaling function (“father wavelet”) g=fstr and a wavelet function (“mother wavelet”) ψ. Any (one-dimensional) function in L2(R) can be approximated by shifted and dilated functions of father wavelet g=fstr; in fact, L2(R) can be approximated by a union of a nested sequence of subspaces

V0V1V2L2(R),

where Vj is the span of the functions

{ϕj,k:ϕj,k(k)=2j2ϕ(2jxk),kZ}.

The shifted and delated functions of mother wavelet

{ψj,k(x)=2j2ψ(2jxk),kZ}

form an orthonormal basis for a “detail space” Wj which is the orthonormal complement of Vj for Vj+1. Consequently, L2(R) is the union of V0, W0, W1, . . . and

{ψj,k,kZ,j=0,1,2,}

along with the functions {g=fstr0,k, kZ} form an orthonormal basis for L2(R).

In this paper we consider only wavelets with support on a finite interval. Without loss of generality we consider L2([0, 1]). The wavelet bases of L2([0, 1]) can be easily adapted from those for L2(R). We will employ periodic boundary handling that gives 2j basis functions at each resolution level j. For simplicity, throughout this paper, we express the orthonormal basis of L2([0, 1]) by taking the union of

{ψj,k,k=0,,2j1,j=0,1,2,}

with the mean function denoted g=fstr1,0

In practice, we only observe x = (x(1/N), . . . , x(N/N))T , where we will take N = 2J for some integer J. For such sampled data, we can apply the discrete wavelet transform (DWT) to obtain the wavelet coefficients. In our simulation and application described here, we used Daubechies’ orthogonal wavelet basis db4, which has 4 vanishing moments. We have also repeated our analyses using other Daubechies basis sets (db5, db6, and db1 (the Haar basis)), and we have found that the results depend very little on the choice of the basis. The DWT is a linear transform which, if applied to a vector x of length N will result in N wavelet coefficients. In matrix form, the vector of wavelet coeffcients can be written z = w(x) = W x, where W is an N × N orthonormal matrix and z = (z1, . . . , zN)T represents the wavelet coeffcients arranged in vector form. For convenience in notation, we use a single subscript for wavelet coeffcients.

The extension of one-dimensional wavelet analysis to two or three dimensions may be accomplished by taking tensor products of the wavelets and scaling functions to create basis functions for L2(R2) or L2(R3) and this can also be adapted to the unit square or unit cube, i.e., for L2([0, 1]2) or for L2([0, 1]3) (Daubechies, 1992).

2.4 Proposed class of semi-metrics

In describing the main idea of our proposed method, we consider only two groups in this paper (i.e., G = 2), noting that it can be easily generalized to more groups. We first divide our data into a training sequence (X1, Y1), . . . , (Xn1, Yn1) and a validation sequence (Xn1+1, Yn1+1), . . . , (Xn1+n2, Yn1+n2), where n1 + n2 = n. We arrange the training sample so that the first n11 samples (X1, Y1), . . . , (Xn11, Yn11) are from group 1 and the remaining samples are from group 2. Then for the training samples, our model can be written:

Xi(tj)=f1(tj)+ij,i=1,,n11,j=1,,NXi(tj)=f2(tj)+ij,i=n11+1,,n1,j=1,,N. (3)

The functions f1 and f2 are the mean function of group 1 and 2, respectively; the errors εij, i = 1, . . . , n1; j = 1, . . . , N are assumed to be i.i.d. random variables with mean zero and variance σ2. For simplicity of presentation, we describe our method in terms of one-dimensional curves, but the same procedure can be applied directly to functions with any dimensionality.

Defining f1 = (f1(1/N), f1(2/N), . . . , f1(N/N))T and f2 similarly, let θ = (θ1, . . . , θN)T denote the wavelet coeffcients of f1f2, i.e., θ = w(f1f2). We can then define the index set

PT={j:θj>T}, (4)

which identifies the indices of the wavelet coeffcients that are most different between f1 and f2. Of course when f1 and f2 are not observed (i.e., when only noisy observations are available), this cannot be determined, but the “oracle semi-metric” between two observed functions x1 and x2 may be defined as if PT were provided:

dO(x1,x2)=jPT(w(x1)jw(x2)j)2, (5)

where w(x1)j indicates the jth coefficient in the DWT of x1. For simplicity of notation, we suppress the dependence of dO on the selected threshold T . The traditional Euclidean metric can be seen to be a special case of the oracle semi-metric with T = ∞ (by Parseval's theorem):

dE(x1,x2)=j=1N(w(x1)jw(x2)j)2=(w(x1)w(x2))T(w(x1)w(x2))=(x1x2)T(x1x2),

which is based on all of the wavelet coeffcients.

In practice, when PT is not provided, we can calculate an empirical version of the index set of coeffcients to include in calculating the semi-metric by defining

θ^j=w(f^1f^2). (6)

where

f^1=1n11i=1n11Xi,f^2=1n1n11in11+1n1Xi (7)

are the estimates based on the training data. Then the sample version of PT is simply P^T={j:θ^j>T} and the corresponding semi-metric is

dW(x1,x2)=jP^T(w(x1)jw(x2)j)2.

In order to apply this method in a real-data situation, we must choose the threshold T as well as the bandwidth h. We propose to do this using a cross-validation algorithm:

For b = 1, . . . , B (resampling steps):

B1. Randomly permute the n samples: (X1b,Y1b),,(Xnb,Ynb) and designate the first n1 samples in the permutation to be the training data. The remaining nn1 samples comprise the validation data.

B2. For each h and T and for each function Xjb in the validation group (j = n1 + 1, . . . , n), compute the probability that, conditional on the observed Xjb , Yjb is assigned to group g using the kernel estimator

p^g,h,Tb(Xjb)=i=1n1I[yib=g]K(h1dW(Xjb,Xib))i=1n1K(h1dW(Xjb,Xib)),g=1,,G

B3: For each h, T , assign Yjb to the group with the highest conditional probability:

g^b(h,T,Xjb)=argmaxgp^g,h,Tb(Xjb)

Optimal h and T values are obtained by minimizing the misclassification rate in the validation sample:

(h^,T^)=argminh,T1n2B(b=1Bj=n1+1nIYjbg^b(h,T,Xjb))

Then for a new independent individual with observed X, our proposed classifier will assign this individual to the group

g^(X)=argmaxgp^g,h^,T^(X).

3 Simulation Study

We performed a simulation study to compare the classification accuracy of the kernel method using four different semi-metrics: the FPCA-based dFPCA (FPCA), Euclidean dE (Euclidean), wavelet-based oracle dO (oracle), and wavelet-based empirical (wavelet) dW described in Section 2.4 as well as four other functional classification algorithms mentioned in Section 1:

We performed simulations for both one-dimension curves and two-dimensional images. Also, in order to compare the empirical wavelet-based semi-metric with the oracle, we conducted a simulation to see how well the empirical version matched the oracle version.

For each of the four semi-metrics (Euclidean, wavelet, oracle, FPCA), the tuning parametersĥ and were chosen based on 100 cross-validation samples For both training and validation sample, half of the curves belong to the first group and the other half belong to the second group. To evaluate the performance of each classifier, 500 additional independent test samples were generated to estimate the misclassification rate.

For one dimensional curve simulation, we compared all eight functional classification methods, while for two dimensional images, we only compared the wavelet thesholding semi-metric (wavelet) with the Euclidean one (Euclidean), since available software allows application only to one-dimensional functional data.

3.1 One-dimensional curves

To study the performance of the various semi-metrics on classifying one-dimensional curves we set N = 1024 in each case for a variety of choices of n. In each case, we set n1 = n/2 to be the size of the training set in each bootstrap sample and place the remaining observations to be in the validation sample. We set the simulation model as follows, and consider nine different combinations:

f1(t)=b(t)+s(t) (8)
f2(t)=b(t), (9)

where t ∈ [0, 1] and b(t) is the “baseline” function and s(t) is the signal that allows discrimination between the two groups. We consider three choices for the baseline function and three choices for the signal, giving nine combinations in all. Our three baseline functions are: b1(t) ≡ 0; b2(t) = sin(4πt) − cos(6πt) (smooth); and

b3(t)={1if1[1024t](mod10)30otherwise}

where [x] is the is the greatest integer that is less than or equal to x. Our three signal functions are defined:

s1(t)={1,2301024<t25010240,otherwise}
s2(t)={1,2001024<t25010240,otherwise}
s3(t)={1,2001024<t2101024or3001024<t31010240,otherwise}

Figure 2 and Figure 3 illustrate the three choices of baseline functions and three choices of signal functions, respectively.

Figure 2.

Figure 2

Three choices of baseline functions

Figure 3.

Figure 3

Three choices of signal functions

Figure 4 and Figure 5 show 20 realizations (10 blue curves for group 1 and 10 red curves for group 2) for the simulation setting with smooth baseline and one bump signal with standard deviation of the noise equal to 0.9 and 1.8, respectively. The only di erence between the two groups is on t ∈ [231/1024, 250/1024] (indicated in the figures by arrows).

Figure 4.

Figure 4

Ten simulated curves for group 1 (blue) and ten simulated curves for group 2 (red) for smooth baseline and one bump signal. The standard deviation for the noise is 0.9. The arrows represent the beginning and ending locations of the bump.

Figure 5.

Figure 5

Ten simulated curves for group 1 (blue) and ten simulated curves for group 2 (red) for smooth baseline and one bump signal. The standard deviation for the noise is 1.8. The arrows represent the beginning and ending locations of the bump.

Then the n curves for training and validation were simulated from the models as given in equation (3) with noise {εij, i = 1, . . . , n; j = 1, . . . , N} being iid normal random variables with mean 0 and variance σ2. For each baseline/signal combination, and for each choice of noise level/sample size, this entire procedure was repeated 10 times. We display the aggregate results.

3.1.1 Differing noise levels

Our first simulation is designed to compare the relative performance of classification among the eight classification algorithms for a range of noise levels. Here we present only the results for n = 100. Simulations for different n values were also performed but the relative performances were very similar.

The standard deviations of the noise were set to 10 levels spaced evenly between 0.9 and 1.8.

Figure 6 demonstrate the effect of the noise level on the performance of classification using the 8 classification methods.

Figure 6.

Figure 6

Estimated misclassification rates for 8 classifiers for different noise levels in 9 situations. Zero (top row), smooth (middle row) and complicated (bottom row) baselines; one bump (left column), wide bump (middle column), and two bumps (right column).

As would be expected, as the standard deviation of the noise increases, the misclassification rate increases for all classifiers. As a general rule, for all the classifiers and all baseline functions, the misclassification rate for the wide bump signal function is the lowest, and that for the two bump signal function is the highest. Note that both the one bump and the two bumps signal functions have 20 nonzero points (out of 1024 total points).

Since the kernel-based methods based on the Euclidean, wavelet, and oracle semi-metrics depend only on the signal s(t) (or equivalently, only on the di erence function f1f2), the results do not change for different choices of baseline function. For the other methods (FPCA, FGLM, FPCR, FKDE, and FQDA), adding a smooth baseline has very little effect on results, but adding a non-smooth, complicated baseline, does result in higher misclassification rates.

As would be expected, the classifier based on the oracle semi-metric dO (oracle) performs better than its empirical counterpart (wavelet) but not much better. Both the wavelet and the oracle methods generally perform better than all of the other classifiers for all situations considered. It is interesting to note that for the wide bump signals and zero or smooth baselines, the FPCA method performs as well as the wavelet and the oracle methods. However, when the baseline function is complicated even with the wide bump, FPCA's performance is affected by the complication, and it does not perform as well.

3.1.2 Differing sample sizes

Our second simulation compares the relative performance of classifiers based on the eight classification algorithms for different sample sizes ranging from 40 to 240. For each simulation the noise level was set to be σ = 1.5. Results are displayed in Figure 7.

Figure 7.

Figure 7

Estimated misclassification rates for 8 classifiers for different sample sizes in 9 situations. Zero (top row), smooth (middle row) and complicated (bottom row) baselines; one bump (left column), wide bump (middle column), and two bumps (right column).

As would be expected, as the sample size increases, in most situations, the misclassification rates also decreases. As seen in the previous section, since the kernel-based methods depending on the Euclidean, wavelet, and oracle semi-metrics depend only on the signal s(t) (or equivalently the di erence function f1f2), the results for these methods are not effected by choice of baseline. For the other five methods (FPCA, FGLM, FPCR, FKDE, and FQDA), again, adding a smooth baseline has very little effect, but adding a non-smooth, complicated baseline, does tend to worsen performance in most situations. Also, as would be expected, the classifier based on the oracle semi-metric dO (oracle) performs better than its empirical counterpart (wavelet) but not much better, and the difference in performance becomes small as n increases. Also, as seen in the previous section, these two methods (wavelet and oracle) perform better than (or at least as well as the other classifiers in all simulation scenarios.

3.1.3 Mimicking the oracle semi-metric

The oracle semi-metric represents an ideal not attainable in practice, but it is informative to investigate how well the empirical version resembles it for various settings. A third simulation study was designed to see how well the coeffcients chosen by the empirical wavelet-based semi-metric match those chosen by the oracle semi-metric for σ = 1.5 and three different sample sizes. For this simulation we use the one bump (but from 200/1024 to 220/1024) signal with zero baseline.

In Table 1, the magnitude for each wavelet coefficient is defined by the ratio of each squared wavelet coefficient θ(j)2 to the total magnitude. The relative importance of the wavelet basis functions in the oracle semi-metric are ordered based on these magnitudes. The first listed coefficient accounts for about 27% of the differences, the second accounts for an additional 21%, and so on. The total magnitude for these 17 coefficients is 95%. In addition, this table lists the frequency (out of 100 simulations) that each coefficient is selected by the empirical wavelet-based semi-metric. As would be expected, the most important coefficients are selected with high probability and the less important coefficients are selected less often. Furthermore, we can see that these frequencies tend to increase as n increases. The oracle semi-metric, depending on the choice of T , would use at most 71 coefficients, since there are only 71 nonzero wavelet coefficients for f1f2 (the remaining 953 coefficients are all zero). In order to determine the false positive rate, Table 1 lists the average percentages of inclusion of coefficients among these 953 coefficients. These decline from 2.61% to 0.71% as n increases from 60 to 200.

Table 1.

Frequency of selection of large magnitude components using wavelet-based semi-metric

magnitude n = 60 n = 100 n = 200

% % %
0.2691 100 100 100
0.2090 97 99 100
0.1070 74 86 99
0.0540 67 84 100
0.0529 67 78 98
0.0435 43 56 81
0.0410 42 60 80
0.0281 23 36 49
0.0215 22 30 58
0.0195 22 29 42
0.0192 17 24 27
0.0192 20 27 40
0.0190 19 21 35
0.0189 15 25 35
0.0129 12 16 17
0.0098 10 10 13
0.0079 11 11 13

remaining 2.61 1.48 0.71

3.2 Two-dimensional images

We also conducted a simulation study to compare the relative performance of classifications for different semi-metrics for the two-dimensional images. For simulated images, we chose a square domain [0, 1]2, and this square is divided into a 128 × 128 grid. The simulated model is:

f1(t1,t2)={1,30128<t136128and30128<t2361280,otherwise}

and f2(t1, t2) ≡ 0.

The noise is generated by iid Gaussian random variables. Sample size n was set to 100 and σ ranged from 0.3 to 3.0. For each simulation, resampling was performed 100 times. Figure 8 illustrates the results of using the Euclidean semi-metric and the wavelet thresholding one. The plot shows that as noise level increases, the misclassification rate increases for both semi-metrics. Similarly to what was observed with the one-dimensional signals, the wavelet thresholding semi-metric tends to perform better than the Euclidean one for all the noise levels.

Figure 8.

Figure 8

Comparison of classification between wavelet thresholding and Euclidean semi-metrics for 2-dimensional images. The x-axis represents noise levels ranging from σ=0.3 to 3.0 and the y-axis represents misclassification rate.

4 Application to PET imaging data

To examine the performance of the proposed methods on real data, we applied the classification algorithm based on the wavelet thresholding semi-metric and the Euclidean metric on images from a depression study. Collected by Parsey et al. (2006) from 51 healthy controls and 69 subjects with major depressive disorder, these images are maps of the binding potential of 5HT1A receptors, which are believed to play an important role in the disorder. The binding potential is an index that measures how many receptors are available for binding. Images were registered to a common template, resulting in a set of 79 transaxial slices of dimension 91 × 109. We adapted our semi-metric to the wavelet domain in three-dimensional image space. In order to determine h and T , resampling was performed 100 times. The 120 images were randomly divided into two groups and classification was run repeatedly. For each repetition, 80 images were used to find optimal parameters for further classification and the remaining 40 images were used to test the performance. The misclassification rate for the wavelet-based semi-metric was estimated to be 0.27, and 0.46 for the Euclidean metric. This rate is quite good given the high noise level of such data and the considerable overlap that exists in binding potential measures calculated for various anatomically defined regions of interest.

5 Discussion

We proposed a new semi-metric based on wavelet thresholding for functional data classification. The simulation results showed that when signals are sparse, the classifier based on the wavelet thresholding semi-metric tends to perform considerably better than (at least comparably well with) all the other functional classification methods, including FPCA, Euclidean, FGLM, FPCR, FKDE, and FQDA for all considered noise levels and sample sizes and all simulated scenarios. Furthermore, we found that the wavelet thresholding semi-metric performs similarly to the oracle version, especially for moderate to large sample sizes. This is due to the ability of the empirical wavelet-based version to select the important coefficients with high probability. We also applied our method to classify two groups of 3D binding potential images. The result showed that our proposed wavelet thresholding semi-metric performed much better than the Euclidean one. From our experience, since our sample size of images is not large (120 images), our proposed resampling method is necessary to improve the performance of our classification method.

One major advantage of taking a wavelet-based approach is that the extension of the semi-metric from one-dimensional signals to two-dimensional and three-dimensional images is quite straightforward; once a basis set is defined (regardless of dimensionality) and calculated coefficients are arranged as a vector, the procedure is exactly the same. Extensions of other methods, though conceptually straightforward, involve some computational challenges.

Though our method is described for the situation of equal variance, it would be straightforward to extend the method to handle the case of unequal variances. If the variance of noise in (3) is given by, say, V ar(εij ) = σ2(tj), then the wavelet thresholding semi-metric should be constructed while taking into account the heterogeneity. Instead of using 12 (see 6 and 7), we replace it by the normalized difference. That is, for j = 1, . . . , N, replacing 12 in (6) by

f^1f^2se(f^1f^2),

where se(12) is the standard error vector for 12 and may be calculated

se(f^1f^2)=i=1n11(Xif^1)T(Xif^1)n111+n11+1n1(Xif^2)T(Xif^2)n1n111.

Acknowledgments

The research was supported in part by NIH grants (5 R01 EB009744-03 and 5 R01 MH099003-02) and grants from the National Science Council of Taiwan (NSC 100-2118-M-110-004 and NSC 100-2118-M-110-004).

References

  • 1.Berlinet A, Biau G, Rouvière L. Functional supervised classification with wavelets. Annales de l'ISUP. 2008;52:61–80. [Google Scholar]
  • 2.Cao J, Fan G. Functional data classification with kernel-induced random forests. 2009. Preprint: http://people.stat.sfu.ca/~cao/Research/FunctionalDataClassification.pdf.
  • 3.Cuevas A, Febrero M, Fraiman R. Robust estimation and classification for functional data via projection-based depth notions. Computational Statistics. 2007;22:481–496. [Google Scholar]
  • 4.Daubechies I. Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics; Philadelphia: 1992. [Google Scholar]
  • 5.Dauxois J, Pousse A, Romain Y. Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference. Journal of Multivariate Analysis. 1982;12:136–154. [Google Scholar]
  • 6.Donoho DL, Johnstone IM. Ideal spatial adaptation via wavelet shrinkage. Biometrika. 1994;81:425–455. [Google Scholar]
  • 7.Fan J, Lin SK. Test of significance when data are curves. Journal of the American Statistical Association. 1998;93:1007–1021. [Google Scholar]
  • 8.Ferraty F, Vieu P. Curves discrimination: A nonparametric functional approach. Computational Statistics & Data Analysis. 2003;44:161–173. [Google Scholar]
  • 9.Ferraty F, Vieu P. Nonparametric Functional Data Analysis: Theory and Prac- tice. Springer; New York: 2006. [Google Scholar]
  • 10.Gunn RN, Gunn SR, Cunningham VJ. Positron emission tomography compartmental models. Journal of Cerebral Blood Flow and Metabolism. 2001;21:635–652. doi: 10.1097/00004647-200106000-00002. [DOI] [PubMed] [Google Scholar]
  • 11.Hall P, Poskitt DS, Presnell B. A functional data-analytic approach to signal discrimination. Technometrics. 2001;43:1–9. [Google Scholar]
  • 12.Hastie T, Buja A, Tibshirani R. Penalized discriminant analysis. The Annals of Statistics. 1995;23:73–102. [Google Scholar]
  • 13.Müller HG, Stadtmüller U. Generalized functional linear models. Annals of Statistics. 2005;33:774–805. [Google Scholar]
  • 14.Parsey RV, Oquendo MA, Ogden RT, Olvet DM, Simpson N, Huang Y, Van Heertum RL, Arango V, Mann JJ. Altered serotonin 1A binding in major depression: A [carbonyl-C-11]WAY100635 positron emission tomography study. Biological Psychiatry. 2006;59:106–113. doi: 10.1016/j.biopsych.2005.06.016. [DOI] [PubMed] [Google Scholar]
  • 15.Ramsay J, Silverman BW. Functional Data Analysis. Springer; New York: 2008. [Google Scholar]
  • 16.Reiss PT, Ogden RT. Functional principal component regression and functional partial least squares. Journal of the Statistical Association. 2007;102:984–996. [Google Scholar]
  • 17.Zhu H, Brown PJ, Morris JS. Robust classification of functional and quantitative image data using functional mixed models. Biometrics. 2012;68:1260–1268. doi: 10.1111/j.1541-0420.2012.01765.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES