Table 2. Clasnip classification performance of CLso rRNA gene regions.
Group | 16S | 16-23S IGS | 50S rplJ/rplL | ||||||
---|---|---|---|---|---|---|---|---|---|
# Sample | Identity (Q5) (%) | Accuracy (%) | # Sample | Identity (Q5) (%) | Accuracy (%) | # Sample | Identity (Q5) (%) | Accuracy (%) | |
A | 10 | 93.4 | 100.0 | 6 | 90.8 | 100.0 | 6 | 97.6 | 100.0 |
B | 9 | 92.4 | 88.9 | 1 | 97.5 | 100.0 | 3 | 97.3 | 100.0 |
C | 11 | 93.7 | 90.9 | 21 | 90.0 | 100.0 | 31 | 100.0 | 100.0 |
Cras1a | 13 | 100.0 | 100.0 | 15 | 95.8 | 93.3 | 18 | 99.8 | 100.0 |
Cras1b | 3 | 100.0 | 100.0 | 3 | 100.0 | 100.0 | 3 | 99.8 | 100.0 |
Cras2 | 3 | 100.0 | 100.0 | 4 | 98.6 | 100.0 | 4 | 99.1 | 100.0 |
D | 10 | 98.2 | 100.0 | 24 | 87.4 | 100.0 | 17 | 96.4 | 100.0 |
E | 5 | 100.0 | 100.0 | 7 | 92.0 | 100.0 | 8 | 98.6 | 100.0 |
F | 1 | 100.0 | 100.0 | – | – | – | 1 | 100.0 | 100.0 |
G | 3 | 98.3 | 100.0 | 3 | 98.9 | 100.0 | 4 | 100.0 | 100.0 |
H | 1 | 100.0 | 100.0 | 1 | 100.0 | 100.0 | 1 | 100.0 | 100.0 |
H-Con | 2 | 100.0 | 100.0 | – | – | – | – | – | – |
U | 1 | 100.0 | 100.0 | 1 | 100.0 | 100.0 | 5 | 99.8 | 100.0 |
Total | 72 | – | 97.2 | 86 | – | 98.8 | 101 | – | 100.0 |
Note:
# Sample is the number of samples with more than 5 SNPs covered in the reference region. Identity (Q5) means the 5% quantile of estimated identity distribution. If a new sample’s identity is greater than the identity (Q5) of a group, the new sample is classified into the group. Accuracy is the ratio of correctly classified samples to all samples with more than 5 SNPs covered in the reference region. “Correctly classified” is defined as the identity of the sample’s group is the highest among other groups.