Table 1. Assembly and sequencing statistics.
Sequenced reads (72 bp) | 255 511 260 (18.2 Gb) |
Reads after quality filtering | 205 144 342 (12.5 Gb) |
Contigs ⩾300 bp | 269.385 |
Total assembly length | 145 294 146 |
Reads useda | 32 025 362 |
Average contig coveragea | 14 |
N50b | 522 |
Average contig lengthb | 540 |
Maximum contig length | 32 884 |
Predicted ORFs | 338 863 |
Average ORF length (s.d.; bp) | 363±262 |
Assigned putative function | 209 413 |
Abbreviation: ORF, open reading frame.
N50 is the length of the smallest contig in the set that contains the fewest (largest) contigs whose combined length represents at least 50% of the assembly (Miller et al., 2010). The number of ORF-assigned putative functions was calculated on the basis of a BLASTP search against a subset of the NCBI nr database containing all bacterial, archaeal and viral proteins using an expect value (e-value) cutoff of 1e−5.
Based on a reference mapping of reads to contigs with the criteria of 95% identity over 90% of the read length.
Calculated on the basis of contigs ⩾300 bp.