Table 1.
Large-scale environmental sequencing projects: properties and scope.
acid mine drainage | Sargasso seaa | farm soil | whale falls | |
---|---|---|---|---|
particle size filtering | none | >0.1 μm;<0.8 μm | none | none |
number of subsamples | 1 | 4a | 1 | 3 |
total amount sequenced–raw | 124 Mbp | 1687 Mbp | 208 Mbp | 116 Mbp |
total amount sequenced–quality filtered | 76 Mbp | 1350 Mbp | 104 Mbpb | 78 Mbp |
read average size–raw | 996 bp | 1015 bp | 1046 bp | 993 bp |
read average size–quality filtered | 737 bp | 818 bp | 696 bp | 673 bp |
fraction of reads failing any assembly | ∼20% | ∼40% | >99% | ∼55% |
genomes reported as largely assembled | 5 | 3 | none | none |
number of ORFs annotated | >12 000 | >1 000 000 | >180 000 | >120 000 |
minimum number of species found | 5 | 1000 | 847c | 17c,d |
estimated total number of species | n.r. | >1800 | >3000 | 25–150d |
reference | (Tyson et al. 2004) | (Venter et al. 2004) | (Tringe et al. 2005) | (Tringe et al. 2005) |
not including data from the Sorcerer II expedition–these data (samples 5–7) were not considered in the original publication (Venter et al. 2004) for the pooled assembly; in addition, they were generated using a variety of different filtering protocols.
filtering here included removal of redundant reads generated by library amplification prior to cloning.
‘ribotypes’; species defined as having 97% identical rRNA sequences.
depending on sub-sample studied.