Bootstrap aggregated classification for sparse functional data

Hyunsung Kim; Yaeji Lim

doi:10.1080/02664763.2021.1889997

. 2021 Feb 20;49(8):2052–2063. doi: 10.1080/02664763.2021.1889997

Bootstrap aggregated classification for sparse functional data

Hyunsung Kim ¹, Yaeji Lim ^1,^CONTACT

PMCID: PMC9225643 PMID: 35757590

ABSTRACT

Sparse functional data are commonly observed in real-data analyzes. For such data, we propose a new classification method based on functional principal component analysis (FPCA) and bootstrap aggregating. Bootstrap aggregating is believed to improve the single classifier. In this paper, we apply this belief to an FPCA based classification, and compare the classification performance with that of the single classifiers. The simulation results show that the proposed method performs better than the conventional single classifiers. We then conduct two real-data analyzes.

Keywords: Functional data, functional principal component analysis, bootstrap aggregating, classification, sparse data

2010 Mathematics Subject Classification: 35Q62

1. Introduction

The development of measurement technology has enabled us to collect data in the form of curves or functions in many fields; these data are called functional or longitudinal data. We analyze such data through functional data analysis (FDA), a method used in various fields such as meteorology and health science.

In FDA, functional data are defined on infinite dimensions because they are considered to be curves or functions rather than single points, thus making dimensionality reduction a key issue. One powerful dimension reduction method in FDA is functional principal component analysis (FPCA) [18]. This method identifies the directions of variations and exploits the data-driven bases called functional principal component (FPC) scores. Since functional data are commonly observed at sparse or irregular time points in real-data analyzes, James et al. [8] proposed the FPCA method based on a reduced-rank mixed-effects framework for sparse functional data. More recently, Yao et al. [23] proposed a conditional estimation-based FPCA method for sparse data.

In this study, we too consider the FPCA method for sparse or irregularly observed functional data, but focus on the classification problem. While the FPCA has been commonly used for the classification problem in functional data, various other methods have also been proposed. James and Hastie [7] proposed a functional linear discriminant analysis (FLDA) for sparse functional data using an expectation-maximization (EM) algorithm. James [6] and Müller and Stadtmüller [13] extended the generalized linear model to a functional analog, while Leng and Müller [11] applied a functional logistic regression based on FPC scores to temporal gene expression data and compared the results with those of the B-spline basis method. Lee [10] presented a support vector machine (SVM) based on FPC scores, and Rossi and Villa [16] proposed a functional SVM (FSVM), extending the SVM to functional data. Song et al. [19] compared the classifiers based on FPC scores using linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), k-nearest neighbor (KNN) classifier, and SVM. More recently, Fan et al. [5] proposed a kernel-induced random forest for functional data classification and applied it to temporal gene expression data.

Unlike previous studies, we propose a new classification method for sparse functional data based on FPCA and bootstrap aggregating (bagging). Bagging is an ensemble method that enhances predictions by combining the classifiers from bootstrap samples [3]. Bauer and Kohavi [2] compared bagging, bagging variants, AdaBoost and Arc-x4 on the decision tree and naive Bayes classifiers using 14 large-scale data sets from UCI, while Kim et al. [9] proposed an SVM ensemble method with bagging or boosting to improve classification performance. Shinde et al. [17] applied bagging to the kernel principal component analysis to improve the preimage estimates.

By extending bagging to functional data, we construct a bagged classification model for sparse functional data, combining the classifiers based on FPC scores from bootstrap samples. Therefore, we contribute to the literature by improving the classification performance for sparse functional data and applying an ensemble technique to functional principal components-based classification models.

The rest of the paper is organized as follows. Section 2 reviews the FPCA method for sparse functional data, and Section 3 describes the proposed method. A simulation study is presented in Section 4, while two real-data cases are analyzed in Section 5. Finally, Section 6 concludes the paper.

2. FPCA for sparse functional data

FPCA is based on the Karhunen–Loève representation of a random function. Let $X (t)$ for $t \in T$ be a square integrable random process in $L^{2} (T)$ with mean function $μ (t) = E [X (t)]$ and covariance function $G (s, t) = cov [X (s), X (t)]$ for $s, t \in T$ . By Mercer's theorem, the covariance function can be represented as

G (s, t) = \sum_{k = 1}^{\infty} λ_{k} ϕ_{k} (s) ϕ_{k} (t),

where $λ_{1} \geq λ_{2} \geq \dots \geq 0$ are nonnegative eigenvalues satisfying $\sum_{k = 1}^{\infty} λ_{k} < \infty$ , and $ϕ_{k}$ is the corresponding orthonormal eigenfunction. Then, given n random curves, $X = [X_{1} (t), \dots, X_{n} (t)]$ , the Karhunen–Loève expansion of $X_{i} (t)$ can be represented as

X_{i} (t) = μ (t) + \sum_{k = 1}^{\infty} ξ_{i k} ϕ_{k} (t), t \in T,

where $ξ_{i k} = \int_{T} (X_{i} (t) - μ (t)) ϕ_{k} (t) d t$ are uncorrelated variables with mean 0 and variance $λ_{k}$ , and the truncated approximation is written as

X_{i} (t) \approx μ (t) + \sum_{k = 1}^{K} ξ_{i k} ϕ_{k} (t), t \in T,

where K is the number of basis functions. Here, K is often selected using the proportion of variance explained (PVE), but the Akaike information criterion (AIC) or Bayesian information criterion (BIC) can also be used for consistency [12,23].

However, when a curve is observed at sparse or irregular time points, we cannot directly apply a conventional FPCA to the data because computing the covariance function could be difficult and the estimated FPC scores could be biased. Therefore, James et al. [8] proposed a reduced-rank model based on mixed-effects model and estimated the FPC function and scores using an EM algorithm, and, more recently, Yao et al. [23] proposed the principal analysis by conditional expectation (PACE) method to obtain unbiased FPC scores. In this paper, we use the PACE method, explained in detail below.

Consider the ith curve $X_{i} = (X_{i} (t_{i 1}), \dots, X_{i} (t_{i n_{i}}))^{T}$ with mean function $μ_{i} = (μ (t_{i 1}), \dots, μ (t_{i n_{i}}))^{T}$ . Here, $t_{i j} \in T$ is the jth time point observed in the ith curve $X_{i}$ , for $i = 1, 2, \dots, n$ , and $j = 1, 2, \dots, n_{i}$ . Furthermore, let $U_{i} = (U_{i} (t_{i n_{1}}), \dots, U_{i} (t_{i n_{i}}))^{T}$ be an observed ith curve with additional measurement errors, $ϵ_{i} = (ϵ_{i} (t_{i 1}), \dots, ϵ_{i} (t_{i n_{i}}))^{T}$ . Then, we have

\begin{aligned} U_{i} (t_{i j}) & = X_{i} (t_{i j}) + ϵ_{i} (t_{i j}) \\ = μ (t_{i j}) + \sum_{k = 1}^{\infty} ξ_{i k} ϕ_{k} (t_{i j}) + ϵ_{i} (t_{i j}), t_{i j} \in T, \end{aligned}

(1)

where $ϵ_{i} (t_{i j})$ is an independent and identically distributed (i.i.d.) error with mean zero and variance $σ^{2}$ ; this is assumed to be independent of the functional principal component score, $ξ_{i k}$ , for $i = 1, 2, \dots, n$ , $j = 1, 2, \dots, n_{i}$ , and $k = 1, 2, \dots$ .

Assuming that $ξ_{i k}$ and $ϵ_{i}$ are jointly Gaussian, the best linear unbiased prediction (BLUP) of $ξ_{i k}$ can be computed as

{\tilde{ξ}}_{i k} = E [ξ_{i k} | U_{i}] = λ_{k} ϕ_{i k}^{T} Σ_{U_{i}}^{- 1} (U_{i} - μ_{i}),

where $ϕ_{i k} = (ϕ_{k} (t_{i 1}), \dots, ϕ_{k} (t_{i n_{i}}))^{T}$ is the ith FPC function, and $Σ_{U_{i}} = cov (U_{i}, U_{i}) = cov (X_{i}, X_{i}) + σ^{2} I_{n_{i}}$ .

Then, trajectory $X_{i} (t)$ can be predicted as

{\hat{X}}_{i} (t) = \hat{μ} (t) + \sum_{k = 1}^{K} {\hat{ξ}}_{i k} {\hat{ϕ}}_{k} (t),

(2)

where ${\hat{ξ}}_{i k}$ , and ${\hat{ϕ}}_{k} (t)$ are estimations obtained from the entire data sample. The number of eigenfunctions, K, is determined using a cross-validation technique.

For more details about the PACE method, see Yao et al. [23].

3. Ensemble classification via FPCA

3.1. Classification based on FPC scores

Using the K FPC scores, ${\hat{ξ}}_{i k}$ , $k = 1, \dots, K$ , in (2), we can construct classifiers based on logistic regression, SVM, LDA, QDA, or naive Bayes. Here, we briefly explain these classifiers that can be applied to FPC scores.

For a logistic regression model, we consider the functional generalized linear model (FGLM) [6,13]. Given the ith functional curve $X_{i} (t)$ and corresponding response $y_{i}$ , the FGLM can be represented as

g (μ) = α + \int_{T} β (t) X_{i} (t) d t,

(3)

where $μ = E (y_{i} | X_{i})$ , and $g (\cdot)$ is a link function. Since we can observe $X_{i} (t)$ at only finite time points $n_{i}$ , the integral can be substituted by a summation.

The estimate of coefficient $β (\cdot)$ is unstable because it is an extremely high-dimensional vector. Therefore, basis expanding of $β (\cdot)$ is a common practice, and with FPCA, $β (\cdot)$ can be represented as

β (t) = \sum_{k = 1}^{K} β_{k} ϕ_{k} (t),

where $ϕ_{k} (t)$ , for $k = 1, 2, \dots, K$ , is an orthonormal basis function in the K-truncated FPCA model.

Then, the FGLM in (3) will be represented as

g (μ) = α + \sum_{k = 1}^{K} β_{k} ξ_{i k} .

Here, $g (\cdot)$ is a logit link to construct the binary classifier, and the classification threshold for the predicted probability is 0.5.

For the SVM, we simply apply the method to the K FPC scores. We consider linear kernel and gaussian kernel SVMs, and select the soft margin and kernel's parameters through cross-validation.

Similarly, for LDA, QDA, and naive Bayes classifiers, we determine the classification rule using the selected K FPC scores as predictors.

These are single classifiers based on FPCA for sparse functional data. In the next section, we propose bootstrap aggregated functional classifiers for applying the bagging method to the single classifiers explained above.

3.2. Bootstrap aggregated functional classifier via sparse FPCA

Here, we propose a new classifier, called the bootstrap aggregated functional classifier, based on FPCA. It extracts several samples with replacement, learns each model, and aggregates the results.

Let $D = {(U_{i}, y_{i}) ∣ i = 1, \dots, n}$ be a set of sparse n curves with $U_{i} = (U_{i} (t_{i n_{1}}), \dots, U_{i} (t_{i n_{i}}))^{T}$ for an observed ith curve at sparse and irregular time points, and $y_{i} \in {1, \dots, g}$ be the response class label. Here, g = 2 for the binary classification.

Let $D^{(b)} = {(U_{i}^{(b)}, y_{i}^{(b)}) | i = 1, \dots, n}$ , for $b = 1, \dots, B$ , be a bootstrap resample from $D$ . Now, for each bootstrap functional data set, we apply one of the classification methods described in Section 3.1, to obtain B classifiers. Then, for $b = 1, \dots, B$ , let ${\hat{f}}^{(b)} (x)$ be a predicted class obtained from the bth bootstrap resample. To aggregate the results, we have two methods, the majority vote and out-of-bag (OOB) error weighted vote [15].

The majority vote simply chooses the class receiving the highest total vote from all B classifiers. That is, the bagged classifier from the majority vote is
${\hat{y}}_{bag} = \arg max_{j \in {1, 2}} \frac{\sum_{b = 1}^{B} I {{\hat{f}}^{(b)} (x) = j}}{B},$ (4)
where $I$ is the indicator function.
For the OOB error weighted vote, the OOB error [4] needs to be first computed. This is a well-known test error estimator for bagging, calculated from the OOB samples in the training set not selected through bootstrapping. Let $e_{b}$ for $b = 1, \dots, B$ , be the OOB errors from the bootstrapped model. Then, we define the weight as $w_{b} = 1 / e_{b}$ , with the models exhibiting good performance receiving higher weights, and other models receiving lower weights. If $e_{b} = 0$ , indicating no errors from the OOB samples, we set $e_{b} = min {e_{1}, \dots, e_{b - 1}, e_{b + 1}, \dots, e_{B}}$ . Then, the bagged classifier estimate using the OOB error weighted vote is
${\hat{y}}_{bag} = \frac{\sum_{b = 1}^{B} w_{b} {\hat{f}}^{(b)} (x)}{\sum_{b = 1}^{B} w_{b}} .$ (5)

The above procedure is summarized in Algorithm 1.

3.2.

4. Simulation studies

4.1. Simulation 1

We conduct simulation studies by referring to the models in Wu and Liu [22]. We generate N = 200 curves with two classes as

U_{g i} (t_{i j}) = μ_{g} (t_{i j}) + \sum_{k = 1}^{3} ξ_{g k} ϕ_{k} (t_{i j}) + ϵ_{i} (t_{i j}), i = 1, \dots, 100, j = 1, \dots, n_{i j},

(6)

where $g \in {0, 1}$ indicates a group label. The FPC functions are defined as

ϕ_{k} (t) = {\begin{cases} \cos (π k t / 5) / \sqrt{5}, & k is odd \\ \sin (π k t / 5) / \sqrt{5}, & k is even, \end{cases}

(7)

and the FPC scores, $ξ_{g k}$ , are sampled from $i . i . d . N (0, λ_{g k})$ , for k = 1, 2, 3. The measurement error $ϵ_{i} (t)$ is sampled from $i . i . d . N (0, {0.5}^{2})$ .

We consider three models: (A) a different mean and variance model, (B) a different mean model, and (C) a different variance model. For each case, the mean function $μ_{g} (t)$ and $λ_{g k}$ are defined as in Table 1.

Table 1.

Parameters for models in simulation 1.

Model	g	$μ_{g} (t)$	$(λ_{g 1}, λ_{g 2}, λ_{g 3})$
(A) Different mean and variance	0	$t + \sin (t)$	$(4, 2, 1)$
	1	$t + \cos (t)$	$(16, 8, 4)$
(B) Different mean	0	$t + \sin (t)$	$(4, 2, 1)$
	1	$t + \cos (t)$	$(4, 2, 1)$
(C) Different variance	0	$t + \sin (t)$	$(4, 2, 1)$
	1	$t + \sin (t)$	$(16, 8, 4)$

Open in a new tab

To make each curve sparse, we randomly select the number of observations for the ith curve, $n_{i}$ , from ${5, 6, \dots, 10}$ , and the corresponding time points $t_{i j}$ , for $j = 1, \dots, n_{i}$ , from $i . i . d . U n i f o r m (0, 10)$ .

Figure 1 plots the sample curves generated from the model in (6). The estimated functional mean curve for each group is plotted in Figure 2.

For validation, we randomly split the generated data into a training set and test set, with 100 elements in each. We compare the results of the proposed method with those of six single functional classification models: logistic regression, SVM with linear kernel, SVM with gaussian kernel, LDA, QDA, and naive Bayes. Since the data are sparse, some curves in the test set are not with in the range of time points in the bootstrap sample or training set. Therefore, the classification error rate is evaluated for all curves except those that are out-of-range.

We select the tuning parameters for the SVM model through a 10-fold cross-validation of the whole training set. We select the number of FPC scores, K, by a PVE greater than 0.99. In this simulation setting, K is chosen from 2 to 6.

The average classification errors and standard errors of 500 Monte Carlo repetitions are presented in Table 2. For all the three models, the proposed ensemble classification methods outperform the single classification methods. In particular, the OOB error weighted vote shows the lowest classification error rate for all cases.

Table 2.

The average classification errors (%) and standard errors (in parentheses) from 500 monte carlo repetitions for three simulation 1 designs.

Model	Method	Logistic regression	SVM (Linear)	SVM (Gaussian)	LDA	QDA	Naive Bayes
(A)	Single	17.09 (5.34)	16.85 (5.13)	15.43 (4.86)	16.73 (5.36)	15.47 (4.95)	16.90 (4.74)
	Majority vote	15.15 (4.45)	15.19 (4.37)	13.37 (4.33)	15.10 (4.42)	14.20 (4.16)	15.70 (4.38)
	OOB weight	14.96 (4.36)	14.88 (4.18)	13.10 (4.11)	14.73 (4.31)	13.63 (4.00)	15.03 (4.23)
(B)	Single	11.29 (3.58)	11.07 (3.55)	11.61 (4.04)	10.68 (3.45)	12.04 (3.83)	13.40 (4.07)
	Majority vote	10.21 (3.40)	10.17 (3.22)	10.63 (3.59)	9.95 (3.26)	10.94 (3.48)	11.99 (3.64)
	OOB weight	10.19 (3.34)	10.05 (3.17)	10.54 (3.53)	9.91 (3.22)	10.79 (3.46)	11.74 (3.59)
(C)	Single	50.67 (5.91)	49.62 (5.41)	32.92 (5.57)	50.70 (5.92)	31.57 (4.70)	30.57 (4.79)
	Majority vote	49.91 (5.96)	48.79 (6.03)	31.11 (5.62)	49.82 (6.00)	30.86 (4.99)	29.74 (4.77)
	OOB weight	49.62 (6.01)	48.80 (5.95)	31.04 (5.51)	49.63 (6.03)	30.73 (4.95)	29.69 (4.71)

Open in a new tab

Note: $^{*}$ The minimum error rate is marked in bold.

4.2. Simulation 2

Simulation 2 is motivated by Yao et al. [24]. We generate N = 200 curves as,

U_{i} (t) = \sum_{k = 1}^{50} ξ_{i k} ϕ_{k} (t) + ϵ_{i} (t), i = 1, \dots, N .

The FPC functions, $ϕ_{k} (t)$ , are generated as in (7), and the FPC scores, $ξ_{i k}$ , are sampled from $i . i . d . N (0, k^{- 3 / 2})$ , for $k = 1, \dots, 50$ . For sparsity of data, the number of observations for the ith curve, $n_{i}$ , is randomly selected from ${10, 11, \dots, 20}$ , and the corresponding time points $t_{i j}$ , for $j = 1, \dots, n_{i}$ , are selected from $i . i . d . U n i f o r m (0, 10)$ . The measurement error, $ϵ_{i} (t)$ , is sampled from $i . i . d . N (0, 0.1)$ .

Now, we consider the following three models:

\begin{aligned} Model (A) . & f (U_{i}) = \exp (⟨ β_{1}, U_{i} ⟩ / 2) - 1, \\ Model (B) . & f (U_{i}) = \arctan (π ⟨ β_{1}, U_{i} ⟩) + \exp (⟨ β_{2}, U_{i} ⟩ / 3) - 1, \\ Model (C) . & f (U_{i}) = \arctan (π ⟨ β_{1}, U_{i} ⟩ / 4), \end{aligned}

where $⟨ f, g ⟩ = \int_{T} f (t) g (t) d t$ for $f, g \in L^{2} (T)$ . Here, $β_{1} (t) = \sum_{k = 1}^{50} b_{k} ϕ_{k} (t)$ , where $b_{k} = 1$ for k = 1, 2; $b_{k} = (k - 2)^{- 3}$ for $k = 3, \dots, 50$ ; and $β_{2} (t) = \sqrt{3 / 10} (t / 5 - 1)$ . We define the class label for each curve, $U_{i}$ , as $y_{i} = sign {f (U_{i}) + ϵ_{i}}$ , where $ϵ_{i} \sim i . i . d . N (0, 0.1)$ . The estimated functional mean curve for each group is plotted in Figure 3.

Figure 3. — Estimated functional mean curve for each group in Simulation 2.

We randomly split the generated data into 100 training and 100 test curves, and compare the results of the proposed methods with those of the single classification models. The average classification errors and standard errors from 500 Monte Carlo repetitions are presented in Table 3. Note that the number of FPC scores, K, is chosen from 2 to 6.

Table 3.

The average classification errors (%) and standard errors (in parentheses) from 500 monte carlo repetitions for the models in simulation 2.

Model	Method	Logistic regression	SVM (Linear)	SVM (Gaussian)	LDA	QDA	Naive Bayes
A	Single	17.92 (4.24)	18.12 (4.34)	19.20 (4.77)	17.95 (4.30)	19.54 (4.64)	20.03 (4.64)
	Majority vote	16.53 (3.95)	16.68 (3.96)	17.24 (4.17)	16.64 (3.87)	17.78 (4.22)	18.53 (4.40)
	OOB weight	16.50 (3.95)	16.65 (3.95)	17.21 (4.20)	16.59 (3.86)	17.71 (4.16)	18.45 (4.45)
B	Single	14.39 (4.11)	14.54 (4.17)	15.39 (4.80)	14.44 (4.23)	15.74 (4.36)	16.81 (4.80)
	Majority vote	12.43 (3.73)	12.71 (3.90)	13.15 (4.16)	12.68 (3.81)	13.63 (4.12)	14.97 (4.35)
	OOB weight	12.34 (3.63)	12.61 (3.89)	13.01 (4.12)	12.49 (3.79)	13.47 (4.14)	14.73 (4.22)
C	Single	15.47 (4.06)	15.75 (4.16)	16.66 (4.66)	15.52 (4.17)	17.13 (4.53)	18.00 (4.73)
	Majority vote	13.94 (3.67)	14.16 (3.80)	14.78 (4.00)	14.18 (3.72)	15.09 (4.08)	16.30 (4.35)
	OOB weight	13.83 (3.65)	14.11 (3.78)	14.74 (3.97)	14.10 (3.66)	14.98 (4.03)	16.12 (4.34)

Open in a new tab

For all the models, the proposed ensemble classifiers show lower classification error rates than do the single classifiers. In particular, aggregating works better with the OOB error weighted vote than majority vote for all models.

5. Real-data analysis

5.1. Berkely growth data

Here, we consider a Berkely growth data set [21] including the height data of 93 individuals (54 girls and 39 boys). The data set has 31 time points from ages 1 to 18 for each curve; the original curves are shown in Figure 4.

Figure 4. — The Berkely growth data of 93 individuals.

Here, we artificially sparsify the data. We randomly select the number of observations for each individual from ${12, 13, \dots, 15}$ , and the corresponding time points from the original time points. For validation, we randomly divide the 93 curves into 62 training and 31 test curves, and compare the performances in gender classification. The number of FPC scores, K, is selected from 2 to 5.

We repeat this process 500 times with different splits; the average results are summarized in Table 4. For all methods, the proposed bagging classifiers exhibit better accuracy than the single classifiation models. In particular, the bagged QDA with majority vote shows the lowest classification error rate, 4.88%.

Table 4.

The average classification errors (%) and standard errors (in parentheses) from 500 random splits in the berkerly growth data.

Method	Logistic regression	SVM (Linear)	SVM (Gaussian)	LDA	QDA	Naive Bayes
Single	6.90 (4.31)	5.55 (3.42)	5.90 (3.55)	5.77 (3.30)	5.74 (3.41)	5.93 (3.92)
Majority vote	5.84 (3.76)	5.12 (3.22)	5.24 (3.29)	5.33 (3.25)	4.88 (3.17)	5.61 (3.65)
OOB weight	5.81 (3.73)	5.25 (3.26)	5.26 (3.26)	5.36 (3.27)	4.93 (3.18)	5.43 (3.56)

Open in a new tab

5.2. Spinal bone mineral density data

For real sparse functional data, we consider spinal bone mineral density data [1]. The data set contains the spinal bone mineral density data of 280 individuals (153 females and 127 males) measured at sparse and irregular time points. Each curve has two to four observations (Figure 5). We compare various methods for gender classification.

Figure 5. — Spinal bone mineral density of 280 individuals.

We randomly divide the data into 187 training and 93 test sets, and apply the methods to 500 different data splits. The number of FPC scores, K, ranges from 3 to 6. The average results of 500 different splits are summarized in Table 5. The proposed bagging classifiers show improved performances for all cases, and the bagged logistic regression with majority vote shows the lowest classification error rate, 30.94%.

Table 5.

The average classification errors (%) and standard errors (in parentheses) from 500 random splits of the spinal bone mineral density data.

Method	Logistic regression	SVM (Linear)	SVM (Gaussian)	LDA	QDA	Naive Bayes
Single	32.07 (4.19)	32.29 (4.20)	33.30 (4.40)	32.00 (4.16)	35.61 (4.58)	34.53 (4.80)
Majority vote	30.94 (4.07)	31.24 (4.23)	31.80 (4.16)	30.99 (3.93)	32.68 (4.31)	32.35 (4.11)
OOB weight	30.94 (4.13)	31.27 (4.18)	31.75 (4.18)	31.00 (3.98)	32.72 (4.32)	32.28 (4.07)

Open in a new tab

6. Conclusion and discussion

In this paper, we propose a new ensemble classification method for sparse functional data. Since observed curves are often sparse and irregular in real-data analyzes, we use FPC scores estimated using the PACE method. We then propose a bagged model combining classifiers based on FPC scores. Bagging is usually applied to the weak learners [20]. However, since we want to compare the performance of single classifiers and corresponding ensemble classifiers, we consider various learners including weak and stable classifiers.

We consider two aggregating methods, the majority vote and OOB error weighted vote, and compare their classification performances with those of the single classifiers. Our simulation results confirm that the proposed aggregated classification models outperform the single classifiers in various settings. Two real-data analyzes show the superiority of the proposed method.

The ensemble classification method can be easily extended to multi-class classification problems, where the aggregating method is expected to outperform single classifiers. Other ensemble methods such as boosting and stacking can also be used [14].

Acknowledgments

This research is supported by the National Research Foundation of Korea (NRF) funded by the Korea government (2019R1A2C4069453).

Funding Statement

This research is supported by the National Research Foundation of Korea (NRF) funded by the Korea government (2019R1A2C4069453) and Korea Institute of Energy Technology Evaluation and Planning (KETEP) and the Ministry of Trade, Industry & Energy (MOTIE) of the Republic of Korea (No. 20199710100060).

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Bachrach L.K., Hastie T., Wang M.C., Narasimhan B., and Marcus R., Bone mineral acquisition in healthy Asian, Hispanic, black, and Caucasian youth: A longitudinal study, J Clin Endocrinol Metab 84 (1999), pp. 4702–4712. [DOI] [PubMed] [Google Scholar]
2.Bauer E. and Kohavi R., An empirical comparison of voting classification algorithms: Bagging, boosting, and variants, Mach. Learn. 36 (1999), pp. 105–139. [Google Scholar]
3.Breiman L., Bagging predictors, Mach. Learn. 24 (1996), pp. 123–140. [Google Scholar]
4.Breiman L., Out-of-bag estimation, Tech. Rep., Statistics Dept., University of California at Berkeley, CA, 1996.
5.Fan G., Cao J., and Wang J., Functional data classification for temporal gene expression data with kernel-induced random forests, 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, IEEE, 2010, pp. 1–5.
6.James G.M., Generalized linear models with functional predictors, J. R. Stat. Soc. Ser. B (Statist. Methodol.) 64 (2002), pp. 411–432. [Google Scholar]
7.James G.M. and Hastie T.J., Functional linear discriminant analysis for irregularly sampled curves, J. R. Stat. Soc. Ser. B (Statist. Methodol.) 63 (2001), pp. 533–550. [Google Scholar]
8.James G.M., Hastie T.J., and Sugar C.A., Principal component models for sparse functional data, Biometrika 87 (2000), pp. 587–602. [Google Scholar]
9.Kim H.C., Pang S., Je H.M., Kim D., and Bang S.Y., Constructing support vector machine ensemble, Pattern Recognit. 36 (2003), pp. 2757–2767. [Google Scholar]
10.Lee H.J., Functional data analysis: Classification and regression, Ph.D. diss., Texas A&M University, 2004.
11.Leng X. and Müller H.G., Classification using functional data analysis for temporal gene expression data, Bioinformatics 22 (2006), pp. 68–76. [DOI] [PubMed] [Google Scholar]
12.Li Y., Wang N., and Carroll R.J., Selecting the number of principal components in functional data, J. Am. Stat. Assoc. 108 (2013), pp. 1284–1294. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Müller H.G. and Stadtmüller U., Generalized functional linear models, Ann. Statist. 33 (2005), pp. 774–805. [Google Scholar]
14.Opitz D. and Maclin R., Popular ensemble methods: An empirical study, J. Artif. Intell. Res. 11 (1999), pp. 169–198. [Google Scholar]
15.Pham H.T., Generalized weighting for bagged ensembles, Ph.D. diss., Iowa State University, 2018.
16.Rossi F. and Villa N., Support vector machine for functional data classification, Neurocomputing 69 (2006), pp. 730–742. [Google Scholar]
17.Shinde A., Sahu A., Apley D., and Runger G., Preimages for variation patterns from kernel PCA and bagging, IIE Trans. 46 (2014), pp. 429–456. [Google Scholar]
18.Silverman B.W., Smoothed functional principal components analysis by choice of norm, Ann. Statist. 24 (1996), pp. 1–24. [Google Scholar]
19.Song J.J., Deng W., Lee H.J., and Kwon D., Optimal classification for time-course gene expression data using functional data analysis, Comput. Biol. Chem. 32 (2008), pp. 426–432. [DOI] [PubMed] [Google Scholar]
20.Sutton C.D., Classification and regression trees, bagging, and boosting, Handbook Statist. 24 (2005), pp. 303–329. [Google Scholar]
21.Tuddenham R.D. and Snyder M.M., Physical growth of California boys and girls from birth to eighteen years, Univ. California Publ. Child Dev. 1 (1954), pp. 183–364. [PubMed] [Google Scholar]
22.Wu Y. and Liu Y., Functional robust support vector machines for sparse and irregular longitudinal data, J. Comput. Graph. Stat. 22 (2013), pp. 379–395. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Yao F., Müller H.G., and Wang J.L., Functional data analysis for sparse longitudinal data, J. Am. Stat. Assoc. 100 (2005), pp. 577–590. [Google Scholar]
24.Yao F., Wu Y., and Zou J., Probability-enhanced effective dimension reduction for classifying sparse functional data, Test 25 (2016), pp. 1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0001] 1.Bachrach L.K., Hastie T., Wang M.C., Narasimhan B., and Marcus R., Bone mineral acquisition in healthy Asian, Hispanic, black, and Caucasian youth: A longitudinal study, J Clin Endocrinol Metab 84 (1999), pp. 4702–4712. [DOI] [PubMed] [Google Scholar]

[CIT0002] 2.Bauer E. and Kohavi R., An empirical comparison of voting classification algorithms: Bagging, boosting, and variants, Mach. Learn. 36 (1999), pp. 105–139. [Google Scholar]

[CIT0003] 3.Breiman L., Bagging predictors, Mach. Learn. 24 (1996), pp. 123–140. [Google Scholar]

[CIT0004] 4.Breiman L., Out-of-bag estimation, Tech. Rep., Statistics Dept., University of California at Berkeley, CA, 1996.

[CIT0005] 5.Fan G., Cao J., and Wang J., Functional data classification for temporal gene expression data with kernel-induced random forests, 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, IEEE, 2010, pp. 1–5.

[CIT0006] 6.James G.M., Generalized linear models with functional predictors, J. R. Stat. Soc. Ser. B (Statist. Methodol.) 64 (2002), pp. 411–432. [Google Scholar]

[CIT0007] 7.James G.M. and Hastie T.J., Functional linear discriminant analysis for irregularly sampled curves, J. R. Stat. Soc. Ser. B (Statist. Methodol.) 63 (2001), pp. 533–550. [Google Scholar]

[CIT0008] 8.James G.M., Hastie T.J., and Sugar C.A., Principal component models for sparse functional data, Biometrika 87 (2000), pp. 587–602. [Google Scholar]

[CIT0009] 9.Kim H.C., Pang S., Je H.M., Kim D., and Bang S.Y., Constructing support vector machine ensemble, Pattern Recognit. 36 (2003), pp. 2757–2767. [Google Scholar]

[CIT0010] 10.Lee H.J., Functional data analysis: Classification and regression, Ph.D. diss., Texas A&M University, 2004.

[CIT0011] 11.Leng X. and Müller H.G., Classification using functional data analysis for temporal gene expression data, Bioinformatics 22 (2006), pp. 68–76. [DOI] [PubMed] [Google Scholar]

[CIT0012] 12.Li Y., Wang N., and Carroll R.J., Selecting the number of principal components in functional data, J. Am. Stat. Assoc. 108 (2013), pp. 1284–1294. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0013] 13.Müller H.G. and Stadtmüller U., Generalized functional linear models, Ann. Statist. 33 (2005), pp. 774–805. [Google Scholar]

[CIT0014] 14.Opitz D. and Maclin R., Popular ensemble methods: An empirical study, J. Artif. Intell. Res. 11 (1999), pp. 169–198. [Google Scholar]

[CIT0015] 15.Pham H.T., Generalized weighting for bagged ensembles, Ph.D. diss., Iowa State University, 2018.

[CIT0016] 16.Rossi F. and Villa N., Support vector machine for functional data classification, Neurocomputing 69 (2006), pp. 730–742. [Google Scholar]

[CIT0017] 17.Shinde A., Sahu A., Apley D., and Runger G., Preimages for variation patterns from kernel PCA and bagging, IIE Trans. 46 (2014), pp. 429–456. [Google Scholar]

[CIT0018] 18.Silverman B.W., Smoothed functional principal components analysis by choice of norm, Ann. Statist. 24 (1996), pp. 1–24. [Google Scholar]

[CIT0019] 19.Song J.J., Deng W., Lee H.J., and Kwon D., Optimal classification for time-course gene expression data using functional data analysis, Comput. Biol. Chem. 32 (2008), pp. 426–432. [DOI] [PubMed] [Google Scholar]

[CIT0020] 20.Sutton C.D., Classification and regression trees, bagging, and boosting, Handbook Statist. 24 (2005), pp. 303–329. [Google Scholar]

[CIT0021] 21.Tuddenham R.D. and Snyder M.M., Physical growth of California boys and girls from birth to eighteen years, Univ. California Publ. Child Dev. 1 (1954), pp. 183–364. [PubMed] [Google Scholar]

[CIT0022] 22.Wu Y. and Liu Y., Functional robust support vector machines for sparse and irregular longitudinal data, J. Comput. Graph. Stat. 22 (2013), pp. 379–395. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0023] 23.Yao F., Müller H.G., and Wang J.L., Functional data analysis for sparse longitudinal data, J. Am. Stat. Assoc. 100 (2005), pp. 577–590. [Google Scholar]

[CIT0024] 24.Yao F., Wu Y., and Zou J., Probability-enhanced effective dimension reduction for classifying sparse functional data, Test 25 (2016), pp. 1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Bootstrap aggregated classification for sparse functional data

Hyunsung Kim

Yaeji Lim

ABSTRACT

1. Introduction

2. FPCA for sparse functional data