Table 4.
Assembly Statistics of Isolate Genome Sequencesa.
Strain name* | Origin of N. vectensis host: date of isolation | No. of Post-QC Sequence Pairs (RPs); Adaptor Trimmed Sequences (ATRs); (estimated genome coverage) | Genome size in base Pairs (GC%) | No. of contigs, (N50) and largest contig (bp) | No. of annotated ORFsb | No. of unique NOG/COGcannotations |
---|---|---|---|---|---|---|
Po_B4 | Thompson Lab, MIT: Spring 2008 | 517,635 RPs 138,192 ATRs (15.5×) | 5,410,491 (64.1%) | 6720 (1075) 10,216 | 6781 | 2206 |
Po_Gab | Thompson Lab, MIT: 2/3/10 | 822,162 RPs 220,986 ATRs (25.3×) | 5,196,558 (64.9%) | 4429 (1695) 33,942 | 6372 | 2252 |
Po_Is | Thompson Lab, MIT: 2/8/10 | 629,033 RPs 164,238 ATRs (19.4×) | 5,288,085 (64.6%) | 5282 (1391) 9535 | 6553 | 2292 |
Po_47 | Finnerty Lab, Boston Univ.: Fall 2009 | 608,695 RPs 150,484 ATRs (18.6×) | 5,254,749 (64.6%) | 5547 (1314) 13,599 | 6359 | 2203 |
Lt_F1 | Marsh, Sippewissett, MA: 4/15/10 | 1567,436 RPs 222,958 ATRs (82.1×) | 3,447,759 (51.7%) | 203 (30,681) 93,380 | 3580 | 1666 |
Lt_FCMA | Marsh, Sippewissett, MA: 4/15/10 | 674,674 RPs 99,317 ATRs (38.3×) | 3,212,319 (52.3%) | 559 (10,292) 31,966 | 3726 | 1571 |
Rr_D5 | Thompson Lab, MIT: 2/3/10 | 509,968 RPs 148,050 ATRs (15.8×) | 5,376,246 (59.1%) | 6285 (1151) 18,386 | 7685 | 2094 |
Rr_D8 | Thompson Lab, MIT: Spring 2008 | 613,428 RPs 170,015 ATRs (18.4×) | 5,488,699 (59.1%) | 5624 (1395) 16,049 | 7543 | 2126 |
Rr_Is | Thompson Lab, MIT: 2008 | 681,511 RPs 203,168 ATRs (20.5×) | 5,430,112 (59.3%) | 4783 (1626) 10,588 | 7323 | 2141 |
Ss_F1 | Marsh, Sippewissett MA: 4/15/10 | 717,642 RPs 231,719 ATRs (25.3×) | 4,420,534 (65.3%) | 3410 (1932) 13,551 | 5487 | 1756 |
Pseudomonas oleovorans (Po), Limnobacter thiooxidans (Lt), Rhizobium radiobacter (Rr), Stappia stellulata (Ss).
Genome assemblies were carried out in CLC Genomics Workbench version 4. Genome size, N50 and largest contig size are calculated by CLC Genomics Workbench. Genome coverage is calculated as the ratio of the bases of Illumina reads assembled in CLC (Mb) to the predicted genome size (Mb).
Open reading frames (ORFs) were identified and annotated via the Rapid Annotations using Subsystems Technology (RAST) server (Aziz et al., 2008).
ORFs were assigned to orthologous groups in the eggNOG Database (v3.0) (Powell et al., 2012) based on similarity searches with BLASTP (Altschul et al., 1997) with a threshold e-value < 1e-20 and where the aligned portion includes the predicted functional residues of the protein (as designated in the COG/NOG database).