Table 1.
The classification accuracy [see Equation (16)] for Dataset I obtained with the CSSS (1-NN classifier) and the five other models: PhymmBL (Brady and Salzberg, 2009), NBC (Rosen et al., 2011), Kraken (Wood and Salzberg, 2014), RAIphy (Nalbantoglu et al., 2011) and PAUDA (Huson and Xie, 2014) when predicting 147 different viral genera across 266 viral DNA sequences as a function of the viral fragment length
Classifier | Full-length genomes accuracy (%) | Viral fragment length 1000-bp accuracy (%) | 500-bp accuracy (%) | 100-bp accuracy (%) |
---|---|---|---|---|
CSSS | 91.43 ± 0.99 | 70.02 ± 2.01 | 63.02 ± 1.49 | 35.94 ± 3.31 |
PhymmBL | 86.56 ± 2.19 | 68.90 ± 1.78 | 57.28 ± 2.09 | 29.79 ± 1.66 |
NBC | 74.67 ± 0.64 | 59.06 ± 1.49 | 50.39 ± 2.77 | 34.04 ± 1.53 |
Kraken | 48.47 ± 1.85 | 26.66 ± 1.94 | 23.07 ± 2.19 | 16.26 ± 1.40 |
RAIphy | 42.03 ± 1.56 | 30.72 ± 1.66 | 23.97 ± 1.66 | 14.06 ± 1.17 |
PAUDA | 0.10 ± 0.15 | 6.73 ± 1.40 | 21.22 ± 1.32 | 31.89 ± 2.42 |