Skip to main content
. 2015 Jul 22;4:e08490. doi: 10.7554/eLife.08490

Table 1.

Accuracy of host prediction based on distance (d) between tetranucleotide frequencies of viral and microbial genomes

DOI: http://dx.doi.org/10.7554/eLife.08490.021

Predicted Host order Host family Host genus
Correct Ratio (%) Correct Ratio (%) Correct Ratio (%)
All reference sequences
 d < 4 × 10−04 98 97 98.98 97 98.98 97 98.98
 4 × 10−04 ≤ d < 1 × 10−03 10,173 9361 92.02 8971 88.18 5261 51.72
 1 × 10−03 ≤ d 2508 1872 74.64 1757 70.06 917 36.56
Host species excluded
 d < 4 × 10−04 21 20 95.24 20 95.24 20 95.24
 4 × 10−04 ≤ d < 1 × 10−03 10,003 9067 90.64 8372 83.69 2992 29.91
 1 × 10−03 ≤ d 2755 1981 71.91 1840 66.79 818 29.69
Host genus excluded
 d < 4 × 10−04 1 0 0.00 0 0.00 0 0.00
 4 × 10−04 ≤ d < 1 × 10−03 9085 7303 80.39 6181 68.04 0 0.00
 1 × 10−03 ≤ d 3693 1768 47.87 1388 37.58 0 0.00

For each viral genome, the order, family, and genus of its host were predicted from the taxonomy of the closest microbial genome (based on the mean absolute difference between tetranucleotide frequency vectors) and compared to the order, family, and genus of the actual host (i.e., the taxonomy of the genome with which the virus was identified). These predictions were computed with (i) all microbial genomes, (ii) excluding specifically all genomes from the host species, and (iii) excluding all genomes from the host genus. Cases with over 75% of prediction accuracy are highlighted in gray.