Comments on: Probability Enhanced Effective Dimension Reduction for Classifying Sparse Functional Data

Chong Zhang; Yufeng Liu

doi:10.1007/s11749-015-0474-y

. Author manuscript; available in PMC: 2017 Mar 1.

Published in final edited form as: Test (Madr). 2016 Jan 25;25(1):44–46. doi: 10.1007/s11749-015-0474-y

Comments on: Probability Enhanced Effective Dimension Reduction for Classifying Sparse Functional Data

Chong Zhang ¹, Yufeng Liu ²

PMCID: PMC4972025 NIHMSID: NIHMS754872 PMID: 27499610

The authors are to be congratulated for their solid contribution in providing a powerful method that handles classification and dimension reduction problems with functional data sets. This type of problems has drawn much attention in the literature, and is known to be difficult due to the complex structure of the corresponding data sets. In particular, many existing dimension reduction methods ignore the relationship between predictors and labels, and perform dimension reduction only using the covariates. Such procedures can be suboptimal and may lead to unstable results, especially when the predictors are sparsely observed. The proposed PEFCS method integrates the observed labels in the dimension reduction step by estimating class conditional probabilities, and is shown to enjoy more competitive and robust performance in numerical examples.

This interesting paper leads to many promising research directions. For example, class conditional probability (we denote it by $P_{j} ({\hat{X}}_{i}) = pr (Y = j | {\hat{X}}_{i}))$ estimation is a crucial step in the PEFCS method. In the literature, it is known that classification methods can be grouped into two main categories: soft and hard classifiers (Wahba, 2002; Liu et al, 2011). Soft classifiers directly estimate class conditional probabilities, which further leads to classification rules. Typical examples of soft classification include Fisher’s LDA and logistic regression. In contrast, hard classifiers bypass direct estimation of probabilities and focus on classification boundary estimation. Typical examples of hard classification include the support vector machine (SVM, Boser et al, 1992; Cortes and Vapnik, 1995) and ψ-learning (Shen et al, 2003). In Liu et al (2011), it was observed that the classification performance of various classifiers depends heavily on the underlying distribution of (X, Y). The authors use the hinge loss for the SVM in this paper. Therefore, a possible generalization of the proposed technique is to employ a more general loss function in their equation (5) for probability estimation, instead of using the weighted SVM. We will briefly discuss the idea below.

Consider the optimization problem

min_{g \in ℱ_{K}} \sum_{i = 1}^{n} ℓ {Y_{i} g ({\hat{X}}_{i})} + λ {‖ g ‖}_{ℱ_{K}}^{2},

(1)

where ℓ(·) is a differentiable loss function for a soft classifier. One can verify that $P_{+ 1} ({\hat{X}}_{i})$ can be estimated using $ℓ^{'} {- \hat{g} ({\hat{X}}_{i})} / ℓ^{'} {\hat{g} ({\hat{X}}_{i})}$ (Liu et al, 2011). For standard classification where the predictors are scalars or vectors, Liu et al (2011) pointed out that when the underlying class conditional probability, as a function of the predictors, is relatively smooth, soft classifiers tend to perform better than the hard ones. Moreover, the transition behavior from soft to hard classifiers were thoroughly investigated using the large-margin unified machine family proposed by Liu et al (2011). For functional data classification, the comparison between soft and hard classifiers and the corresponding transition behavior are largely unknown, and further exploration in this direction can be very interesting.

Another potential research topic is to extend the PEFCS methodology to handle multicategory problems. In this case, the construction of slices in the EDR method becomes more involved. In particular, when Y ∈ {+1, −1}, only one direction of the EDR space can be recovered, because of the existence of homogeneity in learning problems with binary responses. To overcome this difficulty, Shin et al (2014) proposed to construct slices based on $P_{+ 1} ({\hat{X}}_{i})$ . In multicategory classification, estimation of the class conditional probabilities becomes more complex, as one needs to estimate a probability vector ${P_{1} ({\hat{X}}_{i}), P_{2} ({\hat{X}}_{i}), \dots, P_{k} ({\hat{X}}_{i})}$ . Furthermore, how to construct $S_{(P_{1}, P_{2}, \dots, P_{k}) | X}$ remains unclear. Therefore, it can be interesting and challenging to develop new methodology in this future research direction. Next, we provide one possible way to generalize the PEFCS methodology for multicategory problems.

For margin-based classification, when the number of classes is three or larger, one classification function g(·) is not enough to discriminate all classes. To overcome this difficulty, a common approach in the literature is to use k functions for k classes, and impose a sum-to-zero constraint on the k functions to reduce the parameter space and to ensure some theoretical properties such as Fisher consistency. Recently, Zhang and Liu (2014) suggested that using k functions and the sum-to-zero constraint can be inefficient and suboptimal, and proposed the angle-based large margin classifiers for multicategory classification. In particular, consider a simplex W with k vertices {W₁,…, W_k} in a (k − 1)-dimensional space, such that

W_{j} = {\begin{cases} {(k - 1)}^{- 1 / 2} 1_{k - 1}, & j = 1, \\ - (1 + k^{1 / 2}) / {{(k - 1)}^{3 / 2}} 1_{k - 1} + {k / (k - 1)}^{1 / 2} e_{j - 1}, & 2 \leq j \leq k, \end{cases}

where 1_k₋₁ is a vector of 1’s with length k − 1, and e_j ∈ ℝ^k⁻¹is a vector with the jth element 1 and 0 elsewhere. In angle-based classification, one uses a (k − 1)-dimensional classification function f = (f₁,…, f_k₋₁)^T, which maps x to f(x) ∈ ℝ^k⁻¹, where x is the predictor vector. Observe that f introduces k angles with respect to W₁,…, W_k, namely, ∠(f, W_j); j = 1,…, k. The prediction rule is based on which angle is the smallest. In particular, ∠(x) = argmin_j_∈{1,…,_k} ∠(f, W_j), where ∠(x) is the predicted label for x. Based on the observation that

\underset{j \in {1, \dots, k}}{arg min} ∠ (f, W_{j}) = \underset{j \in {1, \dots, k}}{arg max} 〈 f, W_{j} 〉,

Zhang and Liu (2014) proposed the following optimization problem for the angle-based classifier

min \sum_{i = 1}^{n} ℓ {〈 W_{y_{i}}, f (x_{i}) 〉} + λ J (f),

(2)

where ℓ(·) is a binary margin-based surrogate loss function, J(f) is a penalty on f to prevent overfitting, and λ is a tuning parameter to balance the goodness of fit and the model complexity. One advantage of the angle-based classifier is that it is free of the commonly used sum-to-zero constraint, hence it can be more efficient for learning with big data sets. Thus, generalization of the PEFCS method in the angle-based framework should be feasible and promising.

Acknowledgments

The authors were supported in part by National Science and Engineering Research Council of Canada (NSERC) (Zhang), US NSF grant DMS-1407241 (Liu) and NIH/NCI grant R01 CA-149569 (Liu).

Contributor Information

Chong Zhang, Email: chong.zhang@uwaterloo.ca, Department of Statistics and Actuarial Science, University of Waterloo, Tel.: 15198884567–31515.

Yufeng Liu, Email: yfliu@email.unc.edu, Department of Statistics and Operations Research, Department of Genetics, Department of Biostatistics, Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Tel.: 19198431899, Fax: 19199621279.

References

Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. In: Haussler D, editor. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92. Association for Computing Machinery; New York, NY, U.S.A.: 1992. pp. 144–152. DOI http://doi.acm.org/10.1145/130385.130401, URL http://doi.acm.org/10.1145/130385.130401. [Google Scholar]
Cortes C, Vapnik VN. Support vector networks. Machine Learning. 1995;20:273–297. [Google Scholar]
Liu Y, Zhang HH, Wu Y. Soft or hard classification? Large margin unified machines. Journal of the American Statistical Association. 2011;106:166–177. doi: 10.1198/jasa.2011.tm10319. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shen X, Tseng GC, Zhang X, Wong WH. On ψ-learning. Journal of the American Statistical Association. 2003;98:724–734. [Google Scholar]
Shin SJ, Wu Y, Zhang HH, Liu Y. Probability-enhanced sufficient dimension reduction for binary classification. Biometrics. 2014;70(3):546–555. doi: 10.1111/biom.12174. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wahba G. Soft and hard classification by reproducing kernel hilbert space methods. Proceedings of the National Academy of Sciences. 2002;99(26):16,524–16,530. doi: 10.1073/pnas.242574899. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang C, Liu Y. Multicategory angle-based large-margin classification. Biometrika. 2014;101(3):625–640. doi: 10.1093/biomet/asu017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. In: Haussler D, editor. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92. Association for Computing Machinery; New York, NY, U.S.A.: 1992. pp. 144–152. DOI http://doi.acm.org/10.1145/130385.130401, URL http://doi.acm.org/10.1145/130385.130401. [Google Scholar]

[R2] Cortes C, Vapnik VN. Support vector networks. Machine Learning. 1995;20:273–297. [Google Scholar]

[R3] Liu Y, Zhang HH, Wu Y. Soft or hard classification? Large margin unified machines. Journal of the American Statistical Association. 2011;106:166–177. doi: 10.1198/jasa.2011.tm10319. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Shen X, Tseng GC, Zhang X, Wong WH. On ψ-learning. Journal of the American Statistical Association. 2003;98:724–734. [Google Scholar]

[R5] Shin SJ, Wu Y, Zhang HH, Liu Y. Probability-enhanced sufficient dimension reduction for binary classification. Biometrics. 2014;70(3):546–555. doi: 10.1111/biom.12174. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Wahba G. Soft and hard classification by reproducing kernel hilbert space methods. Proceedings of the National Academy of Sciences. 2002;99(26):16,524–16,530. doi: 10.1073/pnas.242574899. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Zhang C, Liu Y. Multicategory angle-based large-margin classification. Biometrika. 2014;101(3):625–640. doi: 10.1093/biomet/asu017. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Comments on: Probability Enhanced Effective Dimension Reduction for Classifying Sparse Functional Data

Chong Zhang

Yufeng Liu

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Comments on: Probability Enhanced Effective Dimension Reduction for Classifying Sparse Functional Data

Chong Zhang

Yufeng Liu

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases