CUSTOMIZED TRAINING WITH AN APPLICATION TO MASS SPECTROMETRIC IMAGING OF CANCER TISSUE

. Author manuscript; available in PMC: 2018 Oct 24.

Published in final edited form as: Ann Appl Stat. 2016 Jan 28;9(4):1709–1725. doi: 10.1214/15-AOAS866

ST	Standard training. This method uses the ℓ₁-penalized regression techniques outlined in Section 2.2, training one model on all of the Ttraining set. The regularization parameter λ is chosen through cross-validation.
SVM	Support vector machine. The cost-tuning parameter is chosen through cross-validation.
KSVM	K-means + SVM. We cluster the training data into K clusters via the K-means algorithm and fit an SVM to each training cluster. Test data are assigned to the nearest cluster centroid. This method is a simpler, special case of the clustered SVMs proposed by Gu and Han (2013), whose recommendation of K = 8 we use.
RF	Random forests. At each split we consider $\sqrt{p}$ of the p predictor variables (classification) or p/3 of the p predictor variables (regression).
KNN	k-nearest neighbors. This simple technique for classification and regression contrasts the performance of customized training with another “local” method. The parameter k is chosen via cross-validation.