Skip to main content
. Author manuscript; available in PMC: 2015 Dec 1.
Published in final edited form as: J Psychiatr Res. 2014 Sep 16;59:68–76. doi: 10.1016/j.jpsychires.2014.08.017

Figure 2. Machine Learning Approach for feature selection and classification.

Figure 2

Note: 1. Unselected data are organized such that all potential predictors are normalized to ranges of 0–1 and a ‘target’ variable is specified; 2+3. The Markov Boundary Feature Selection algorithm removes redundant or uninformative variables to identify an irreducible set of predictors in a random 90% of cases. This set of predictors is then confirmed in a random 10% of cases, and the procedure is repeated 10 times; 4+5. Selected features are fed into seven different classification algorithms to determine the accuracy of selected features to classify the ‘target’ variable and to provide an accuracy estimate using area under the receive operator characteristic curve (AUC). The classification algorithms are tested using the same cross-validation procedure as for feature selection. Additionally, when optimizing parameters of the model (for example for the polynomial SVMs) an additional step of splitting each training set into a training & validation set is added; f. A mean AUC across 100 cross-validation runs is provided to determine the overall accuracy of the selected and validated features for classifying the target variable (in this analysis remission vs. non-remission trajectory membership or PTSD diagnostic status). SVM = Support Vector Machine; ROC = Receiver Operator Characteristic.