Skip to main content
. 2002 Apr;12(4):656–664. doi: 10.1101/gr.229202

Table 5.

Sensitivity and Specificity of Single Near-Perfect (One Mismatch Allowed) Nucleotide K-mer Matches as a Search Criterion

12 13 14 15 16 17 18 19 20 21 22












A. 81% 0.945 0.880 0.831 0.721 0.657 0.526 0.465 0.408 0.356 0.255 0.218
83% 0.975 0.936 0.904 0.820 0.770 0.649 0.591 0.535 0.480 0.361 0.318
85% 0.991 0.971 0.954 0.900 0.865 0.767 0.719 0.669 0.619 0.490 0.445
87% 0.997 0.990 0.983 0.954 0.935 0.867 0.833 0.796 0.757 0.634 0.591
89% 1.000 0.997 0.995 0.984 0.976 0.939 0.920 0.897 0.872 0.775 0.741
91% 1.000 1.000 0.999 0.996 0.994 0.979 0.971 0.962 0.950 0.890 0.869
93% 1.000 1.000 1.000 0.999 0.999 0.996 0.994 0.991 0.988 0.963 0.954
95% 1.000 1.000 1.000 1.000 1.000 1.000 0.999 0.999 0.999 0.994 0.992
97% 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000












B. K 12 13 14 15 16 17 18 19 20 21 22
F 275671 68775 17163 4284 1070 267 67 17 4.2 1.0 0.3

(A) Columns are for K sizes of 12–22. Rows represent various percentage identities between the homologous sequences. The table entries show the fraction of homologies detected as calculated by equation 6 assuming a homologous region of 100 bases. (B) K represents the size of the near-perfect match. F shows how many perfect matches of this size expected to occur by chance according to equation 7 in a genome of 3 billion bases using a query of 500 bases.