Skip to main content
. 2016 May 16;11(5):e0155370. doi: 10.1371/journal.pone.0155370

Fig 2. Flow chart of the KA-predictor approach, which includes feature calculation, feature selection and model training.

Fig 2

Flowchart of training dataset is shown in grey arrow, and flowchart of test set is shown in black arrow. For training dataset, firstly, collecting 14 subtypes of features based on the training dataset by various tools, such as PSIBLAST and spider-HSE. Subsequently, ranking each type of features by the Pearson Correlation Coefficient (PCC) and conducting stepwise feature selection for each type. The 5-fold cross-validation was applied to feature selection by evaluating the performance on the training dataset. A support vector machine (SVM) classifier, LibSVM, was utilized to train parameters and build an accuracy prediction model. For independent test set or an input sequence for user, firstly conducting feature calculation and then selecting the same features as training dataset, finally utilizing the models trained on the training dataset to obtain the predicted output.