TABLE 1.
Statistic | Illumina assembly (SRa) | PacBio assembly (LRa) | PacBio CCS15 reads (LR) |
---|---|---|---|
Starting sequences | 149,018 | 19,982 | 1,535,891 |
Putative phages (VIBRANT) | 10,979 | 947 | 50,296 |
95% identity clustering | 10,979 | 947 | 42,156 |
Unique sequencesa | 5,886 | 36 | 30,203 |
Nucleotides sequenced (Gb) | 23.4 | 31.0 | 7.6 |
Unique sequences/Gbp sequenced | 251.53 | 1.16 | 3,974 |
Unique sequences (versus GOV2)b | 4,196 | 35 | 26,766 |
No. complete (high quality)c | 9 (53) | 15 (114) | 0 (27) |
Min–max sequence length (bp) | 1,000–188,349 | 1,353–428,169 | 1,011–17,836 |
Avg sequence length (bp) | 4,906 | 32,260 | 5,261 |
Min–max GC content (%) | 19.40–65.25 | 19.56–69.93 | 14.25–86.03 |
Avg GC content (%) | 35.45 | 36.9 | 38.13 |
Total proteinsd | 80,487 | 41,599 | 330,157 |
Unique terminase (terL) proteins | 30 | 2 | 393 |
Avg proteins/sequence | 7.33 | 43.92 | 7.83 |
Avg protein length (aa) | 190.29 | 223.42 | 177.9 |
Sequences not present in the other data sets (BLASTN, 95%; coverage of at least 70% of the smallest sequence).
Sequences not present in the other data sets or the Global Ocean Virome 2.0 (BLASTN, 95%; coverage of at least 70% of the smallest sequence).
VIBRANT defines a high-quality sequence as one that likely contains the majority of a virus’s complete genome (~70% completeness).
Values shown here represent protein numbers after dereplication (CD-HIT, 95% identity).