Skip to main content
. Author manuscript; available in PMC: 2018 Oct 24.
Published in final edited form as: Ann Appl Stat. 2016 Jan 28;9(4):1709–1725. doi: 10.1214/15-AOAS866
ST Standard training. This method uses the 1-penalized regression techniques outlined in Section 2.2, training one model on all of the Ttraining set. The regularization parameter λ is chosen through cross-validation.
SVM Support vector machine. The cost-tuning parameter is chosen through cross-validation.
KSVM K-means + SVM. We cluster the training data into K clusters via the K-means algorithm and fit an SVM to each training cluster. Test data are assigned to the nearest cluster centroid. This method is a simpler, special case of the clustered SVMs proposed by Gu and Han (2013), whose recommendation of K = 8 we use.
RF Random forests. At each split we consider p of the p predictor variables (classification) or p/3 of the p predictor variables (regression).
KNN k-nearest neighbors. This simple technique for classification and regression contrasts the performance of customized training with another “local” method. The parameter k is chosen via cross-validation.