Skip to main content
. 2002 Apr;12(4):656–664. doi: 10.1101/gr.229202

Table 7.

Sensitivity and Specificity of Multiple (2 and 3) Perfect Nucleotide K-mer Matches as a Search Criterion

2,8 2,9 2,10 2,11 2,12 3,8 3,9 3,10 3,11 3,12











A. 81% 0.681 0.508 0.348 0.220 0.129 0.389 0.221 0.112 0.051 0.021
83% 0.790 0.638 0.475 0.326 0.208 0.529 0.339 0.193 0.099 0.045
85% 0.879 0.762 0.615 0.460 0.318 0.676 0.487 0.313 0.180 0.093
87% 0.942 0.866 0.752 0.611 0.461 0.809 0.649 0.470 0.305 0.177
89% 0.978 0.940 0.868 0.761 0.625 0.910 0.801 0.648 0.476 0.314
91% 0.994 0.980 0.947 0.884 0.787 0.969 0.914 0.815 0.673 0.505
93% 0.999 0.996 0.986 0.962 0.912 0.993 0.976 0.933 0.851 0.722
95% 1.000 1.000 0.998 0.993 0.979 0.999 0.997 0.987 0.961 0.902
97% 1.000 1.000 1.000 1.000 0.999 1.000 1.000 0.999 0.997 0.987











B. N,K 2,8 2,9 2,10 2,11 2,12 3,8 3,9 3,10 3,11 3,12
F 524 27 1.4 0.1 0.0 0.1 0.0 0.0 0.0 0.0

(A) Columns are for N sizes of 2 and 3 and K sizes of 8–12. Rows represent various percentage identities between the homologous sequences. The table entries show the fraction of homologies detected as calculated by equation 10. (B) N and K represent the number and size of the near-perfect matches, respectively. F shows how many perfect clustered matches expected to occur by chance according to equation 14 in a translated genome of 3 billion bases using a query of 167 amino acids.