Table 1.
Kingdom | Type | Initial count | Environmental samples | Incomplete sequences | Redundancy reduction | Total in HMM |
Archaea | 5S | 58 | 0 | 0 | 10 | 48 |
16S | 589 | 239 | 471 | 287 | 76 | |
23S | 37 | 0 | 18 | 8 | 15 | |
Bacteria | 5S | 461 | 0 | 0 | 101 | 360 |
16S | 12 107 | 1429 | 10 723 | 2485 | 743 | |
23S | 398 | 0 | 155 | 130 | 127 | |
Eukaryotes | 5S | 316 | 0 | 0 | 33 | 283 |
18S | 6585 | 24 | 5222 | 836 | 979 | |
28S | 157 | 0 | 91 | 8 | 58 |
Environmental samples were excluded due to lack of phylogenetic information. Sequences with too many unknown nucleotides in either end of the sequence were excluded to improve HMM accuracy. Redundancy reduction was performed to reduce bias. Note that these groups may overlap. The last column indicates the number of sequences used to build each HMM.