Skip to main content
. 2007 Apr 22;35(9):3100–3108. doi: 10.1093/nar/gkm160

Table 1.

The initial number of rRNA sequences and the number of sequences excluded for different reasons.

Kingdom Type Initial count Environmental samples Incomplete sequences Redundancy reduction Total in HMM
Archaea 5S 58 0 0 10 48
16S 589 239 471 287 76
23S 37 0 18 8 15
Bacteria 5S 461 0 0 101 360
16S 12 107 1429 10 723 2485 743
23S 398 0 155 130 127
Eukaryotes 5S 316 0 0 33 283
18S 6585 24 5222 836 979
28S 157 0 91 8 58

Environmental samples were excluded due to lack of phylogenetic information. Sequences with too many unknown nucleotides in either end of the sequence were excluded to improve HMM accuracy. Redundancy reduction was performed to reduce bias. Note that these groups may overlap. The last column indicates the number of sequences used to build each HMM.