Table 1. Phylogenetic affiliations of major bins in the TA data set identified with the composition-based classifier, PhyloPythia (McHardy et al., 2007)a.
Phylogenetic affiliation | No of DNA contigs | Total sequence (Mb) | Average read depth | Expected genome size (Mb)b |
---|---|---|---|---|
Bacteroidetes (class) | 70 | 0.155 | 2.3±1.1 | — |
Bacteroidales | 120 | 0.235 | 2.2±0.9 | — |
Betaproteobacteria | 73 | 0.096 | 1.7±0.6 | — |
Deltaproteobacteria | 160 | 0.343 | 2.1±0.8 | — |
Uncultured Syntrophus | 196 | 0.724 | 2.9±1.3 | 1.9 |
Geobacter | 264 | 0.578 | 2.1±0.7 | — |
Firmicutes | 430 | 0.628 | 2.1±0.7 | — |
Clostridia | 66 | 0.372 | 3.3±1.6 | — |
Uncultured Pelotomaculum sppc | 1083 | 4.256 | 3.2±1.5 | 3.6 |
OP5 | 228 | 1.411 | 3.8±1.9 | 2.8 |
Spirochaetes (class) | 81 | 0.177 | 2.3±0.9 | — |
Euryarchaeota | 1560 | 2.648 | 2.1±0.9 | — |
Thermoplasmata | 71 | 0.148 | 1.9±0.8 | — |
Methanomicrobiales | 36 | 0.098 | 2.4±1.8 | — |
Uncultured Methanolinea sppc | 78 | 2.162 | 5.3±3.9 | 3.7 |
Methanosarcinales | 15 | 0.095 | 3.7±1.9 | — |
Uncultured Methanosaeta | 1180 | 2.613 | 2.6±1.1 | 3.1 |
Uncultured Methanosaeta | 351 | 2.361 | 4.2±1.3 | 2.8 |
Unclassified | 46 280 |
Abbreviations: rRNA, ribosomal RNA; SNP, single-nucleotide polymorphism; TA, terephthalate.
No DNA contigs were binned to WWE1 related to C. acidaminovorans because of insufficient training data for PhyloPythia. 16S rRNA clone library indicted that132 sequences were affiliated with WWE1 and grouped into two different clusters. One of the clusters (37/132) was closely related to C. acidaminovorans (similarity=96–98.8%).
Expected genome size was calculated based on the percent coverage of the corresponding isolate genomes. For example, there are 1735 genes in Pelotomaculum thermopropionicum that are best-BLAST matches to genes from the metagenome dataset. Given that P. thermopropionicum contains 2920 genes, we estimate the genome size of the uncultured Pelotomaculum sp. was 7.16 Mb (4.256 *2920/1735). However, there are at least two strains of Pelotomaculum present in the sample. Therefore, the individual genome size for each strain is estimated to be around 3.6 Mb. For estimating the Methanolinea genome size we used as a reference genome Methanoculleus marisnigri. For Methanosaeta genome size, 2.6 Mb of sequences give hits to 1438 proteins in M. thermophila genome has 1730 coding sequences predicted so the expected genome size would be (2.6 × 1730)/1438=3.1 Mb. In the case of OP5, expected genome size was calculated based on the occurrence of phylogenetic marker clusters of orthologous genes (COGs) that are defined as COGs having one or mostly one member in the genomes that are present and are available in Integrated Microbial Genome/Microbiome (IMG/M). The OP5 bin contained 91 out of 180 phylogenetic marker COGs.
At least two species/strains were observed in each bin. With SNP frequencies of at least 0.03–0.07% (data not shown), we concluded that these species/strains are not clonal.