Table 5.
Performance test of viral genome annotation tools based on four species
| Species | Software | Genes | CDS | CDS with function | CDS without function | Genes with same start and end position (%)a | Genes with same start position | Similarity scoreb |
|---|---|---|---|---|---|---|---|---|
| SARS-CoV-2 | Ref_annotationc | 11 | 13 | 4 (30.77%) | 9 (69.23%) | / | / | / |
| VADRd | / | / | / | / | / | / | / | |
| VAPiD | 10 | 10 | 4 (40.00%) | 6 (60.00%) | 9 (81.82%%) | 9 (81.82%) | 86% | |
| GeneSAS_genemarkS | 9 | 9 | / | / | 6 (54.55%) | 7 (63.64%) | 61% | |
| GeneSAS_glimmer3 | 12 | 12 | / | / | 8 (72.73%) | 9 (81.82%) | 78% | |
| Dengue virus | Ref_annotation | 1 | 1 | 1 | 0 | / | / | / |
| VADR | 1 | 1 | 1 | 0 | 1 (100%) | 1 (100%) | 100% | |
| VAPiD | 1 | 1 | 1 | 0 | 1 (100%) | 1 (100%) | 100% | |
| GeneSAS_genemarkS | 1 | 1 | / | / | 1 (100%) | 1 (100%) | 100% | |
| GeneSAS_glimmer3 | 2 | 2 | / | / | 1 (100%) | 1 (100%) | 66.67% | |
| Hepacivirus C | Ref_annotation | 1 | 3 | 3 | 0 | / | / | / |
| VADR | 1 | 3 | 3 | 0 | 1 (100%) | 1 (100%) | 100% | |
| VAPiD | 1 | 3 | 3 | 0 | 1 (100%) | 1 (100%) | 100% | |
| GeneSAS_genemarkS | 2 | 2 | / | / | 0 (0%) | 0 (0%) | 0% | |
| GeneSAS_glimmer3 | 3 | 3 | / | / | 1 (100%) | 1 (100%) | 50% | |
| Norwalk virus | Ref_annotation | 3 | 3 | 3 | 0 | / | / | / |
| VADR | 3 | 3 | 3 | 0 | 3 (100%) | 3 (100%) | 100% | |
| VAPiD | 3 | 3 | 3 | 0 | 3 (100%) | 3 (100%) | 100% | |
| GeneSAS_genemarkS | 3 | 3 | / | / | 3 (100%) | 3 (100%) | 100% | |
| GeneSAS_glimmer3 | 3 | 3 | / | / | 3 (100%) | 3 (100%) | 100% |
aPercentage is equal to (Genes with same start and end position/Ref_annotation genes)*100.
bSimilarityScore = ((Genes with same start position)/(Totalx + Totalz))*2*100. Totalx and Totalz are the total number of genes in the software annotation and reference annotation. The function is from BEACON (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4539851/).
cref_annotation means the reference annotation. It is from NCBI RefSeq.
dVADR with default reference models (Flaviviridae and Caliciviridae) could not annotate the genome of SARS-CoV-2, because VADR could not find anyone in the default model library to meet the similarity standard for homology-based annotation. VADR can annotate SARS-CoV-2 by using its SARS-CoV-2 specific reference model.