Table 1.
Dataset | % identity | #full-length | #partial with stop | #total |
---|---|---|---|---|
MPI (New data) |
100 |
6 272 |
15 886 |
28 169 |
|
98 |
5 778 |
14 893 |
26 992 |
|
90 |
5 667 |
14 502 |
26 433 |
JGI + Genoscope (Existing data) |
100 |
6 233 |
15 539 |
23 962 |
|
98 |
5 360 |
13 365 |
19 890 |
|
90 |
5 008 |
12 341 |
18 155 |
MPI + JGI + Genoscope (Combined data) |
100 |
10 778 |
26 068 |
42 665 |
|
98 |
9 359 |
23 131 |
38 185 |
90 | 8 722 | 21 288 | 35 235 |
Number of full-length (with start and stop codon), partial (with stop codon), and total number of predicted protein sequences in the three datasets clustered at 100%, 98% and 90% identity.