Table 1.
Evaluation of accuracy and CPU time of CPC and CONC on three datasets
Dataset | Dataset type | Dataset sizea | Accuracy | Time (min) | ||
---|---|---|---|---|---|---|
CPC | CONC | CPC | CONC | |||
Rfam | Noncoding | 30 770 | 98.62% | 97.12% | 3513 | 46 376 |
RNADB | Noncoding | 3996 | 91.50% | 85.44% | 598 | 7322 |
Embl cds | Coding | 121 914 | 99.08% | 98.70% | 69 116 | 826 210b |
aCONC focuses on sequences with at least 80 nucleotides and assumes shorter sequences unlikely to have coding potential. CPC does not make this assumption and has similar performance on shorter sequences, but to make a direct comparison here we shows results only on sequences with at least 80 nucleotides.
bBecause the required CPU time is long, the dataset was split and run on 24 nodes in parallel. The reported CPU time was the sum of execution time on individual nodes.