Skip to main content
. 2008 Mar 3;9:136. doi: 10.1186/1471-2105-9-136

Table 4.

The distribution of clusters with their characteristics given different values for k (the number of clusters) from 500 to 3,000.

K 500 1,000 2,000 3,000
Single Species cluster 422 (84.4%) 904 (90.4%) 1897 (94.9%) 2894 (96.5%)
# of Phenocopy-Pairs (of 25) 25 (100%) 13 (52%) 12 (48%) 8 (32%)
Cluster w/PT-Sim ≥ 0.4 92 (18.4%) 293 (29.3%) 526 (26.3%) 810 (40.5%)
# Genes 3221 5886 6379 6878
Cluster w/GO-Sim ≥ 0.4 51 (10.2%) 206 (20.6%) 522 (26.1%) 921 (46.1%)
Correlation GO-Sim vs PT-SIM 0.53 0.41 0.37 0.28
# Genes 863 1800 2392 3065
Cluster w/PPi ≥ 75% 21 (4.2%) 60 (6.0%) 174 (8.7%) 305 (10.2%)
# Genes 1497 1858 2335 2702
Cluster w/PPi ≥ 33% 63 (12.6%) 138 (13.8%) 286 (14.3%) 413 (13.8%)
# Genes 3890 4322 4965 4996
Cluster for GO-Predictions 90 (18%) 196 (19.6%) 393 (19.7%) 611 (20.4%)
# Genes 2820 3213 4145 4546
# Terms 142 345 730 1226
Precision 72.55% 67.91% 63.40% 60.31%
Recall 16.73% 22.98% 25.63% 28.32%
Avg. Genes/Cluster 54 29 16 11

As internal measure for cluster quality we sought to gain insight how the data structure changes by choosing different values for k, ranging from 500 to 3,000. Here, Filter 1 has been applied for GO-predictions. For details, see text.