Table 1.
No. of sequences | Area under PPV–SEN curve |
Bias | SD | No. of samples |
95% credibility limit |
||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Ensemble | First cluster | Second cluster | First cluster | Second cluster | First+second cluster | Ensemble | First cluster | Second cluster | |||
2 | 0.44 | 0.46 | 0.37 | 0.27 | 0.04 | 728.13 | 150.76 | 878.89 | 0.21 | 0.14 | 0.11 |
3 | 0.58 | 0.59 | 0.49 | 0.20 | 0.03 | 793.15 | 124.94 | 918.09 | 0.14 | 0.10 | 0.07 |
4 | 0.58 | 0.58 | 0.48 | 0.20 | 0.03 | 791.66 | 115.00 | 906.66 | 0.14 | 0.09 | 0.06 |
5 | 0.62 | 0.63 | 0.51 | 0.17 | 0.03 | 802.20 | 113.24 | 915.44 | 0.12 | 0.08 | 0.05 |
6 | 0.67 | 0.67 | 0.54 | 0.16 | 0.03 | 800.50 | 111.66 | 912.16 | 0.11 | 0.07 | 0.05 |
7 | 0.70 | 0.69 | 0.57 | 0.15 | 0.03 | 795.52 | 111.92 | 907.44 | 0.10 | 0.07 | 0.05 |
8 | 0.73 | 0.71 | 0.60 | 0.15 | 0.03 | 797.56 | 116.19 | 913.75 | 0.10 | 0.07 | 0.04 |
9 | 0.73 | 0.73 | 0.60 | 0.14 | 0.02 | 790.59 | 122.38 | 912.97 | 0.09 | 0.06 | 0.04 |
10 | 0.75 | 0.74 | 0.63 | 0.13 | 0.02 | 792.85 | 125.11 | 917.96 | 0.09 | 0.06 | 0.04 |
For each row, we not only calculate the average area under the PPV–SEN curve for accuracy comparison, but also summarize the bias-variance statistics and the size of the two biggest clusters to visualize the clustering results. In order to normalize bias, SD and credibility limits with respect to the sequence length, we divide them by the average sequence length for the family.