Table 1. Overlaps in phages within data-sources.
Data source | # clusters | % overlap * | Notes |
---|---|---|---|
‘Earth's virome’ project (44) | 5412 | 57.4% | Over 3000 samples were sequenced; most are environmental samples |
Predicted prophages in human gut (1,42) | 1505 | 18.67% | ∼1700 fecal samples from two gut metagenomic studies (1,42) |
Predicted viral and prophage sequences from complete and draft genomes (36) | 7117 | 18.07% | |
Predicted prophages from NCBI complete genomes (40) | 6964 | 15.4% | All available complete prokaryotic genomes (as of May 2017) |
NCBI reference viral genome database (39) | 776 | 0.64% | |
Predicted prophages from EMBL proGenomes database (41) | 3275 | 0.61% | Representative complete prokaryotic genomes (as of May 2017) |
ICTV | 668 | 0 | Data obtained from the International Committee on Taxonomy of Viruses (https://talk.ictvonline.org; ICTV) |
* within each data-source, the overlap ratio is defined as proportion of phage clusters containing multiple sequences from the data source, out of the total phage clusters containing any number of sequences from the same data source.