Fig. 2.
Optimization of parameters for the process of applying the kNN algorithm to the profiles of SNP-syntaxes. kNN algorithm to SNP-S profiles has three parameters: (i) filtering percentage for selecting rare features below specified frequency threshold. For example, for 1% filtering, the features below 1% frequency among study population are selected for analysis; (ii) the length of SNP-S; and (iii) k for selecting number of nearest neighbors of a test individual. The training accuracies of the method were measured for several different settings of the three variables and the optimal setting was found for the best accuracy. The accuracy is defined as (TP + TN)/(TP + TN + FP + FN), where TP, TN, FP, and FN denote the number of true positives, true negatives, false positives, and false negatives, respectively.