Skip to main content
. 2011 Nov 16;6(11):e27567. doi: 10.1371/journal.pone.0027567

Table 1. Data abundancy after using CD-HIT.

Sequence identity Positive data of training set Positive data of independent test set Negative data
100% (original) 283 99 19512
90% 274 94 18897
80% 268 94 18447
70% 256 94 17727
60% 242 88 16710
50% 226 82 15255
40% 202 80 13333
30% 173 65 11113